Commit Graph

25370 Commits

Author SHA1 Message Date
Baoquan He
6de421198c x86/apic, ACPI: Remove the repeated lapic address override entry parsing
The ACPI MADT has a 32-bit field providing lapic address at which
each processor can access its lapic information. MADT also contains
an optional entry to provide a 64-bit address to override the 32-bit
one. However the current code does the lapic address override entry
parsing twice. One is in early_acpi_boot_init() because AMD NUMA need
get boot_cpu_id earlier. The other is in acpi_boot_init() which parses
all MADT entries.

So in this patch we remove the repeated code in the 2nd part.

Meanwhile print lapic override entry information like other MADT entry,
this will be added to boot log.

This patch is not supposed to change any runtime behavior, other than
improving kernel messages.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-15 08:53:37 +02:00
Baoquan He
a91bf718db x86/mm/numa: Open code function early_get_boot_cpu_id()
Previously early_acpi_boot_init() was called in early_get_boot_cpu_id()
to get the value for boot_cpu_physical_apicid. Now early_acpi_boot_init()
has been taken out and moved to setup_arch(), the name of
early_get_boot_cpu_id() doesn't match its implementation anymore, and
only the getting boot-time SMP configuration code was left.

So in this patch we open code it.

Also move the smp_found_config check into default_get_smp_config to
simplify code, because both early_get_smp_config() and get_smp_config()
call x86_init.mpparse.get_smp_config().

Also remove the redundent CONFIG_X86_MPPARSE #ifdef check when we call
early_get_smp_config().

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-15 08:51:54 +02:00
Linus Torvalds
9710cb6624 Power management fixes for v4.8-rc2
- Fix the x86 identity mapping creation helpers to avoid the
    assumption that the base address of the mapping will always be
    aligned at the PGD level, as it may be aligned at the PUD level
    if address space randomization is enabled (Rafael Wysocki).
 
  - Fix the hibernation core to avoid executing tracing functions
    before restoring the processor state completely during resume
    (Thomas Garnier).
 
  - Fix a recently introduced regression in the powernv cpufreq
    driver that causes it to crash due to an out-of-bounds array
    access (Akshay Adiga).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXrjxTAAoJEILEb/54YlRxhsAP/RHGfc0DtkvZyJPfW5eAT73t
 LihmOFtOeGF6Bo0pyM1YnGW4DdIgfnfBYbFSrKlorfveVikK1QkgcEb69XxJwhjW
 i/75Gwy5sLhdjzmGVV7kpmozhwSo4gbfW6q4rJ3x3FEWxMcLbMPAA4AlJq0kVdRm
 CfwTS7YIx/zCWWJTTL8CW0WuVoVOYKuJThCd/HwuwBF1Y8pqg5XAmeyDH2HzQDbH
 OdR4dLjS2xki0f2z1TdAUeSVn8FcuRoH6e/sF5v8T/3I2LdbME3QiCf9uYkeyWJ3
 vhUM40x6O+lB84HdsZjXQqbX/7lZmDj5bgcyPFf2WA/WOf12Y7OquQSc/yKasOrK
 mNFPDUyl+hbUiD5BvDQES/HOxNLFkekARFEb2Ud4HUrN2nIbEghDRcQ5zP6/Nf9o
 Cht8kS/OYe7PeMWXPXDX+zb8Fi8O5jz/9GJ97h6gYKBcaLPbuxUNkhxu5ikIGK+f
 CgefgdpNWS1EdooYmmSFHRyY8RxQjuw7l0CJh7TpTJJFgthr7iCN2A7UQqKlt/zU
 ARqnsUSRQcvjQs23tw8fPwRzUEuynW4udqVNM5XnvNu46KGWqkRgCVMmO6lNrIl6
 v/+S8hLVFJH0t00Y+ZGvh0YcGHR65S1CMdNAuMxd1Gylr/Y3neRun0hHI6qDA19N
 ErPNMydb6BSY+vqcO/i1
 =DWxX
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "Two hibernation fixes allowing it to work with the recently added
  randomization of the kernel identity mapping base on x86-64 and one
  cpufreq driver regression fix.

  Specifics:

   - Fix the x86 identity mapping creation helpers to avoid the
     assumption that the base address of the mapping will always be
     aligned at the PGD level, as it may be aligned at the PUD level if
     address space randomization is enabled (Rafael Wysocki).

   - Fix the hibernation core to avoid executing tracing functions
     before restoring the processor state completely during resume
     (Thomas Garnier).

   - Fix a recently introduced regression in the powernv cpufreq driver
     that causes it to crash due to an out-of-bounds array access
     (Akshay Adiga)"

* tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / hibernate: Restore processor state before using per-CPU variables
  x86/power/64: Always create temporary identity mapping correctly
  cpufreq: powernv: Fix crash in gpstate_timer_handler()
2016-08-12 16:23:58 -07:00
Linus Torvalds
01ea443982 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This is bigger than usual - the reason is partly a pent-up stream of
  fixes after the merge window and partly accidental.  The fixes are:

   - five patches to fix a boot failure on Andy Lutomirsky's laptop
   - four SGI UV platform fixes
   - KASAN fix
   - warning fix
   - documentation update
   - swap entry definition fix
   - pkeys fix
   - irq stats fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
  x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
  x86/boot: Rework reserve_real_mode() to allow multiple tries
  x86/boot: Defer setup_real_mode() to early_initcall time
  x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
  x86/boot: Run reserve_bios_regions() after we initialize the memory map
  x86/irq: Do not substract irq_tlb_count from irq_call_count
  x86/mm: Fix swap entry comment and macro
  x86/mm/kaslr: Fix -Wformat-security warning
  x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
  x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
  x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
  x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
  x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
  x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
  x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
  x86/mm: Disable preemption during CR3 read+write
  x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
  x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
  x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
2016-08-12 14:31:10 -07:00
Linus Torvalds
3bc6d8c155 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Misc fixes: a /dev/rtc regression fix, two APIC timer period
  calibration fixes, an ARM clocksource driver fix and a NOHZ
  power use regression fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
  x86/timers/apic: Inform TSC deadline clockevent device about recalibration
  x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
  timers: Fix get_next_timer_interrupt() computation
  clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered
2016-08-12 13:55:06 -07:00
Rafael J. Wysocki
0aeeb3e73f Merge branches 'pm-sleep' and 'pm-cpufreq'
* pm-sleep:
  PM / hibernate: Restore processor state before using per-CPU variables
  x86/power/64: Always create temporary identity mapping correctly

* pm-cpufreq:
  cpufreq: powernv: Fix crash in gpstate_timer_handler()
2016-08-12 22:53:58 +02:00
Linus Torvalds
ad83242a8f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, plus two uncore-PMU fixes, an uprobes fix, a
  perf-cgroups fix and an AUX events fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add enable_box for client MSR uncore
  perf/x86/intel/uncore: Fix uncore num_counters
  uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
  perf/core: Set cgroup in CPU contexts for new cgroup events
  perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash
  perf probe ppc64le: Fix probe location when using DWARF
  perf probe: Add function to post process kernel trace events
  tools: Sync cpufeatures headers with the kernel
  toops: Sync tools/include/uapi/linux/bpf.h with the kernel
  tools: Sync cpufeatures.h and vmx.h with the kernel
  perf probe: Support signedness casting
  perf stat: Avoid skew when reading events
  perf probe: Fix module name matching
  perf probe: Adjust map->reloc offset when finding kernel symbol from map
  perf hists: Trim libtraceevent trace_seq buffers
  perf script: Add 'bpf-output' field to usage message
2016-08-12 13:21:18 -07:00
Kan Liang
95f3be7984 perf/x86/intel/uncore: Add enable_box for client MSR uncore
There are bug reports about miscounting uncore counters on some
client machines like Sandybridge, Broadwell and Skylake. It is
very likely to be observed on idle systems.

This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be
cleared after Package C7, and nothing will be count.
The related errata (HSD 158) could be found in:

  www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf

This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL
in ->enable_box(). The workaround does not cover all cases. It helps for new
events after returning from C7. But it cannot prevent C7, it will still
miscount if a counter is already active.

There is no drawback in leaving it enabled, so it does not need
disable_box() here.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-12 08:35:05 +02:00
Kan Liang
10e9e7bd59 perf/x86/intel/uncore: Fix uncore num_counters
Some uncore boxes' num_counters value for Haswell server and
Broadwell server are not correct (too large, off by one).

This issue was found by comparing the code with the document. Although
there is no bug report from users yet, accessing non-existent counters
is dangerous and the behavior is undefined: it may cause miscounting or
even crashes.

This patch makes them consistent with the uncore document.

Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-12 08:35:04 +02:00
Denys Vlasenko
68187872c7 uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
Since instruction decoder now supports EVEX-encoded instructions, two fixes
are needed to correctly handle them in uprobes.

Extended bits for MODRM.rm field need to be sanitized just like we do it
for VEX3, to avoid encoding wrong register for register-relative access.

EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
paranoid here: proper encoding for register-relative access
should have EVEX.x = 1.

Secondly, we should fetch vex.vvvv for EVEX too.
This is now super easy because instruction decoder populates
vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # v4.1+
Fixes: 8a764a875f ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-12 08:29:24 +02:00
Sebastian Andrzej Siewior
d52c0569ba x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
I made a mistake while converting the driver to the hotplug state
machine and as a result x2apic_cluster_probe() was accessing
cpus_in_cluster before allocating it.

This patch fixes it by setting the cpumask after the allocation the
memory succeeded.

While at it, I marked two functions static which are only used within
this file.

Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6b2c28471d ("x86/x2apic: Convert to CPU hotplug state machine")
Link: http://lkml.kernel.org/r/1470924515-9444-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 16:35:50 +02:00
Ville Syrjälä
d721b02fd0 drm/i915: Account for TSEG size when determining 865G stolen base
Looks like the TSEG lives just above TOUD, stolen comes after TSEG.

The spec seems somewhat self-contradictory in places, in the ESMRAMC
register desctription it says:
 TSEG Size:
  10=(TOUD + 512 KB) to TOUD
  11 =(TOUD + 1 MB) to TOUD

so that agrees with TSEG being at TOUD. But the example given
elsehwere in the spec says:

 TOUD equals 62.5 MB = 03E7FFFFh
 TSEG selected as 512 KB in size,
 Graphics local memory selected as 1 MB in size
 General System RAM available in system = 62.5 MB
 General system RAM range00000000h to 03E7FFFFh
 TSEG address range03F80000h to 03FFFFFFh
 TSEG pre-allocated from03F80000h to 03FFFFFFh
 Graphics local memory pre-allocated from03E80000h to 03F7FFFFh

so here we have TSEG above stolen.

Real world evidence agrees with the TOUD->TSEG->stolen order however, so
let's fix up the code to account for the TSEG size.

Cc: Taketo Kabe <fdporg@vega.pgw.jp>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Fixes: 0ad98c74e0 ("drm/i915: Determine the stolen memory base address on gen2")
Fixes: a4dff76924 ("x86/gpu: Add Intel graphics stolen memory quirk for gen2 platforms")
Reported-by: Taketo Kabe <fdporg@vega.pgw.jp>
Tested-by: Taketo Kabe <fdporg@vega.pgw.jp>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96473
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470653919-27251-1-git-send-email-ville.syrjala@linux.intel.com
Link: http://download.intel.com/design/chipsets/datashts/25251405.pdf
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-08-11 17:20:42 +03:00
Alex Thorlton
f72075c9ed x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case
This problem has actually been in the UV code for a while, but we didn't
catch it until recently, because we had been relying on EFI_OLD_MEMMAP
to allow our systems to boot for a period of time.  We noticed the issue
when trying to kexec a recent community kernel, where we hit this NULL
pointer dereference in efi_sync_low_kernel_mappings():

 [    0.337515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000880
 [    0.346276] IP: [<ffffffff8105df8d>] efi_sync_low_kernel_mappings+0x5d/0x1b0

The problem doesn't show up with EFI_OLD_MEMMAP because we skip the
chunk of setup_efi_state() that sets the efi_loader_signature for the
kexec'd kernel.  When the kexec'd kernel boots, it won't set EFI_BOOT in
setup_arch, so we completely avoid the bug.

We always kexec with noefi on the command line, so this shouldn't be an
issue, but since we're not actually checking for efi_runtime_disabled in
uv_bios_init(), we end up trying to do EFI runtime callbacks when we
shouldn't be. This patch just adds a check for efi_runtime_disabled in
uv_bios_init() so that we don't map in uv_systab when runtime_disabled ==
true.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: <stable@vger.kernel.org> # v4.7
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1470912120-22831-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 13:55:36 +02:00
Andy Lutomirski
5bc653b731 x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
On my Dell XPS 13 9350 with firmware 1.4.4 and SGX on, if I boot
Fedora 24's grub2-efi off a hard disk, my first 1MB of RAM looks
like:

 efi: mem00: [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000000fff] (0MB)
 efi: mem01: [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000001000-0x0000000000027fff] (0MB)
 efi: mem02: [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000028000-0x0000000000029fff] (0MB)
 efi: mem03: [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002a000-0x000000000002bfff] (0MB)
 efi: mem04: [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002c000-0x000000000002cfff] (0MB)
 efi: mem05: [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002d000-0x000000000002dfff] (0MB)
 efi: mem06: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002e000-0x0000000000057fff] (0MB)
 efi: mem07: [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000058000-0x0000000000058fff] (0MB)
 efi: mem08: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000059000-0x000000000009ffff] (0MB)

My EBDA is at 0x2c000, which blocks off everything from 0x2c000 and
up, and my trampoline is 0x6000 bytes (6 pages), so it doesn't fit
in the loader data range at 0x28000.

Without this patch, it panics due to a failure to allocate the
trampoline.  With this patch, it works:

 [  +0.001744] Base memory trampoline at [ffff880000001000] 1000 size 24576

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/998c77b3bf709f3dfed85cb30701ed1a5d8a438b.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 13:53:07 +02:00
Andy Lutomirski
5ff3e2c3c3 x86/boot: Rework reserve_real_mode() to allow multiple tries
If reserve_real_mode() fails, panicing immediately means we're
doomed.  Make it safe to try more than once to allocate the
trampoline:

 - Degrade a failure from panic() to pr_info().  (If we make it to
   setup_real_mode() without reserving the trampoline, we'll panic
   them.)

 - Factor out helpers so that platform code can supply a specific
   address to try.

 - Warn if reserve_real_mode() is called after we're done with the
   memblock allocator.  If that were to happen, we would behave
   unpredictably.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/876e383038f3e9971aa72fd20a4f5da05f9d193d.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 11:15:01 +02:00
Andy Lutomirski
d0de0f685d x86/boot: Defer setup_real_mode() to early_initcall time
There's no need to run setup_real_mode() as early as we run it.
Defer it to the same early_initcall that sets up the page
permissions for the real mode code.

This should be a code size reduction.  More importantly, it give us
a longer window in which we can allocate the real mode trampoline.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/fd62f0da4f79357695e9bf3e365623736b05f119.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 11:15:00 +02:00
Andy Lutomirski
18bc7bd523 x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
The initialization process for trampoline_cr4_features and
mmu_cr4_features was confusing.  The intent is for mmu_cr4_features
and *trampoline_cr4_features to stay in sync, but
trampoline_cr4_features is NULL until setup_real_mode() runs.  The
old code synchronized *trampoline_cr4_features *twice*, once in
setup_real_mode() and once in setup_arch().  It also initialized
mmu_cr4_features in setup_real_mode(), which causes the actual value
of mmu_cr4_features to potentially depend on when setup_real_mode()
is called.

With this patch, mmu_cr4_features is initialized directly in
setup_arch(), and *trampoline_cr4_features is synchronized to
mmu_cr4_features when the trampoline is set up.

After this patch, it should be safe to defer setup_real_mode().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d48a263f9912389b957dd495a7127b009259ffe0.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 11:15:00 +02:00
Andy Lutomirski
007b756053 x86/boot: Run reserve_bios_regions() after we initialize the memory map
reserve_bios_regions() is a quirk that reserves memory that we might
otherwise think is available.  There's no need to run it so early,
and running it before we have the memory map initialized with its
non-quirky inputs makes it hard to make reserve_bios_regions() more
intelligent.

Move it right after we populate the memblock state.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/59f58618911005c799c6c9979ce6ae4881d907c2.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 11:14:59 +02:00
Aaron Lu
82ba4faca1 x86/irq: Do not substract irq_tlb_count from irq_call_count
Since commit:

  52aec3308d ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")

the TLB remote shootdown is done through call function vector. That
commit didn't take care of irq_tlb_count, which a later commit:

  fd0f586972 ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")

... tried to fix.

The fix assumes every increase of irq_tlb_count has a corresponding
increase of irq_call_count. So the irq_call_count is always bigger than
irq_tlb_count and we could substract irq_tlb_count from irq_call_count.

Unfortunately this is not true for the smp_call_function_single() case.
The IPI is only sent if the target CPU's call_single_queue is empty when
adding a csd into it in generic_exec_single. That means if two threads
are both adding flush tlb csds to the same CPU's call_single_queue, only
one IPI is sent. In other words, the irq_call_count is incremented by 1
but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
bigger than irq_call_count and the substract will produce a very large
irq_call_count value due to overflow.

Considering that:

  1) it's not worth to send more IPIs for the sake of accurate counting of
     irq_call_count in generic_exec_single();

  2) it's not easy to tell if the call function interrupt is for TLB
     shootdown in __smp_call_function_single_interrupt().

Not to exclude TLB shootdown from call function count seems to be the
simplest fix and this patch just does that.

This bug was found by LKP's cyclic performance regression tracking recently
with the vm-scalability test suite. I have bisected to commit:

  3dec0ba0be ("mm/rmap: share the i_mmap_rwsem")

This commit didn't do anything wrong but revealed the irq_call_count
problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
concurrent with multiple threads.  When remap_one is try_to_unmap_one(),
then multiple threads could queue flush TLB to the same CPU but only
one IPI will be sent.

Since the commit was added in Linux v3.19, the counting problem only
shows up from v3.19 onwards.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 11:14:59 +02:00
Dave Hansen
ace7fab7a6 x86/mm: Fix swap entry comment and macro
A recent patch changed the format of a swap PTE.

The comment explaining the format of the swap PTE is wrong about
the bits used for the swap type field.  Amusingly, the ASCII art
and the patch description are correct, but the comment itself
is wrong.

As I was looking at this, I also noticed that the
SWP_OFFSET_FIRST_BIT has an off-by-one error.  This does not
really hurt anything.  It just wasted a bit of space in the PTE,
giving us 2^59 bytes of addressable space in our swapfiles
instead of 2^60.  But, it doesn't match with the comments, and it
wastes a bit of space, so fix it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Fixes: 00839ee3b2 ("x86/mm: Move swap offset/type up in PTE to work around erratum")
Link: http://lkml.kernel.org/r/20160810172325.E56AD7DA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 11:04:10 +02:00
Nicolas Iooss
62d16b5a3f x86/mm/kaslr: Fix -Wformat-security warning
debug_putstr() is used to output strings without using printf-like
formatting but debug_putstr(v) is defined as early_printk(v) in
arch/x86/lib/kaslr.c.

This makes clang reports the following warning when building
with -Wformat-security:

    arch/x86/lib/kaslr.c:57:15: warning: format string is not a string
    literal (potentially insecure) [-Wformat-security]
            debug_putstr(purpose);
                         ^~~~~~~

Fix this by using "%s" in early_printk().

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160806102039.27221-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 10:58:12 +02:00
Dave Hansen
b79daf8589 x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
The Memory Protection Keys "rights register" (PKRU) is
XSAVE-managed, and is saved/restored along with the FPU state.

When kernel code accesses FPU regsisters, it does a delicate
dance with preempt.  Otherwise, the context switching code can
get confused as to whether the most up-to-date state is in the
registers themselves or in the XSAVE buffer.

But, PKRU is not a normal FPU register.  Using it does not
generate the normal device-not-available (#NM) exceptions which
means we can not manage it lazily, and the kernel completley
disallows using lazy mode when it is enabled.

The dance with preempt *only* occurs when managing the FPU
lazily.  Since we never manage PKRU lazily, we do not have to do
the dance with preempt; we can access it directly.  Doing it
this way saves a ton of complicated code (and is faster too).

Further, the XSAVES reenabling failed to patch a bit of code
in fpu__xfeature_set_state() the checked for compacted buffers.
That check caused fpu__xfeature_set_state() to silently refuse to
work when the kernel is using compacted XSAVE buffers.  This
broke execute-only and future pkey_mprotect() support when using
compact XSAVE buffers.

But, removing fpu__xfeature_set_state() gets rid of this issue,
in addition to the nice cleanup and speedup.

This fixes the same thing as a fix that Sai posted:

  https://lkml.org/lkml/2016/7/25/637

The fix that he posted is a much more obviously correct, but I
think we should just do this instead.

Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-Cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20160727232040.7D060DAD@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 16:12:26 +02:00
Thomas Garnier
25dfe47853 x86/mm/64: Enable KASLR for vmemmap memory region
Add vmemmap in the list of randomized memory regions.

The vmemmap region holds a representation of the physical memory (through
a struct page array). An attacker could use this region to disclose the
kernel memory layout (walking the page linked list).

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/1469635196-122447-1-git-send-email-thgarnie@google.com
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 16:10:06 +02:00
Valdis Kletnieks
5e44258d16 x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
Building an X86_64 kernel with W=1 throws a total of 9,948 lines of warnings of
this form for both 32-bit and 64-bit syscall tables. Given that the entire rest
of the build for my config only generates 8,375 lines of output, this is a big
reduction in the warnings generated.

The warnings follow this pattern:

  ./arch/x86/include/generated/asm/syscalls_32.h:885:21: warning: initialized field overwritten [-Woverride-init]
   __SYSCALL_I386(379, compat_sys_pwritev2, )
                     ^
  arch/x86/entry/syscall_32.c:13:46: note: in definition of macro '__SYSCALL_I386'
   #define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
                                              ^~~
  ./arch/x86/include/generated/asm/syscalls_32.h:885:21: note: (near initialization for 'ia32_sys_call_table[379]')
   __SYSCALL_I386(379, compat_sys_pwritev2, )
                     ^
  arch/x86/entry/syscall_32.c:13:46: note: in definition of macro '__SYSCALL_I386'
   #define __SYSCALL_I386(nr, sym, qual) [nr] = sym,

Since we intentionally build the syscall tables this way, ignore that one
warning in the two files.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/7464.1470021890@turing-police.cc.vt.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 16:05:16 +02:00
Mike Travis
5a52e8f822 x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
The latest UV kernel support panics when RHEL7 kexec's the kdump kernel
to make a dumpfile.  This patch fixes the problem by turning off all UV
support if NUMA is off.

Tested-by: Frank Ramsay <framsay@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801184050.577755634@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:55:39 +02:00
Mike Travis
22ac2bca92 x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
There are some circumstances where the UV4 BIOS cannot provide the
correct Proximity Node values to associate with specific Sockets and
Physical Nodes.  The decision was made to remove these values from BIOS
and for the kernel to get these values from the standard ACPI tables.

Tested-by: Frank Ramsay <framsay@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801184050.414210079@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:55:38 +02:00
Mike Travis
e363d24c2b x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
Save the uv_systab::size field before doing the iounmap()
of the struct pointer, to avoid a NULL dereference crash.

Tested-by: Frank Ramsay <framsay@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801184050.250424783@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:55:38 +02:00
Mike Travis
054f621fd5 x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
The UV4 Socket IDs are not guaranteed to equate to Node values which
can cause the GAM (Global Addressable Memory) table lookups to fail.
Fix this by using an independent index into the GAM table instead of
the Socket ID to reference the base address.

Tested-by: Frank Ramsay <framsay@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801184050.048755337@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:55:38 +02:00
Borislav Petkov
3e03530587 x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
Clarify why exactly RF cannot be restored properly by SYSRET to avoid
confusion.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160803171429.GA2590@nazgul.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:53:43 +02:00
Boris Ostrovsky
aa877175e7 cpu/hotplug: Prevent alloc/free of irq descriptors during CPU up/down (again)
Now that Xen no longer allocates irqs in _cpu_up() we can restore
commit:

  a899418167 ("hotplug: Prevent alloc/free of irq descriptors during cpu up/down")

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1470244948-17674-3-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:42:57 +02:00
Sebastian Andrzej Siewior
5cf0791da5 x86/mm: Disable preemption during CR3 read+write
There's a subtle preemption race on UP kernels:

Usually current->mm (and therefore mm->pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.

But then, there is this scenario on x86-UP:

TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:

 -> mmput()
 -> exit_mmap()
 -> tlb_finish_mmu()
 -> tlb_flush_mmu()
 -> tlb_flush_mmu_tlbonly()
 -> tlb_flush()
 -> flush_tlb_mm_range()
 -> __flush_tlb_up()
 -> __flush_tlb()
 ->  __native_flush_tlb()

At this point current->mm is NULL but current->active_mm still points to
the "old" mm.

Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.

Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.

Now the fun part:

Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (->active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.

The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more time…

This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.

Fix this by disabling preemption across the critical code section.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:37:16 +02:00
Kees Cook
404f6aac9b x86: Apply more __ro_after_init and const
Guided by grsecurity's analogous __read_only markings in arch/x86,
this applies several uses of __ro_after_init to structures that are
only updated during __init, and const for some structures that are
never updated.  Additionally extends __init markings to some functions
that are only used during __init, and cleans up some missing C99 style
static initializers.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20160808232906.GA29731@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:55:05 +02:00
Thomas Garnier
fb754f958f x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
Default implementation expects 6 pages maximum are needed for low page
allocations. If KASLR memory randomization is enabled, the worse case
of e820 layout would require 12 pages (no large pages). It is due to the
PUD level randomization and the variable e820 memory layout.

This bug was found while doing extensive testing of KASLR memory
randomization on different type of hardware.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: kernel-hardening@lists.openwall.com
Fixes: 021182e52f ("Enable KASLR for physical mapping memory regions")
Link: http://lkml.kernel.org/r/1470762665-88032-2-git-send-email-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:45:19 +02:00
Thomas Garnier
c7d2361f75 x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
Initialize KASLR memory randomization after max_pfn is initialized. Also
ensure the size is rounded up. It could create problems on machines
with more than 1Tb of memory on certain random addresses.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: kernel-hardening@lists.openwall.com
Fixes: 021182e52f ("Enable KASLR for physical mapping memory regions")
Link: http://lkml.kernel.org/r/1470762665-88032-1-git-send-email-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:45:19 +02:00
Arnd Bergmann
22cc1ca3c5 x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
Ville Syrjälä reports "The first time I run hwclock after rebooting
I get this:

 open("/dev/rtc", O_RDONLY)              = 3
 ioctl(3, PHN_SET_REGS or RTC_UIE_ON, 0) = 0
 select(4, [3], NULL, NULL, {10, 0})     = 0 (Timeout)
 ioctl(3, PHN_NOT_OH or RTC_UIE_OFF, 0)  = 0
 close(3)                                = 0

On all subsequent runs I get this:

 open("/dev/rtc", O_RDONLY)              = 3
 ioctl(3, PHN_SET_REGS or RTC_UIE_ON, 0) = -1 EINVAL (Invalid argument)
 ioctl(3, RTC_RD_TIME, 0x7ffd76b3ae70)   = -1 EINVAL (Invalid argument)
 close(3)                                = 0"

This was caused by a stupid typo in a patch that should have been
a simple rename to move around contents of a header file, but
accidentally wrote zeroes into the rtc rather than reading from
it:

  463a86304c ("char/genrtc: x86: remove remnants of asm/rtc.h")

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rtc-linux@googlegroups.com
Fixes: 463a86304c ("char/genrtc: x86: remove remnants of asm/rtc.h")
Link: http://lkml.kernel.org/r/20160809195528.1604312-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:37:06 +02:00
Ingo Molnar
fdbdfefbab Merge branch 'linus' into timers/urgent, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:36:23 +02:00
Alexander Potapenko
469f002312 x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
Dmitry Vyukov has reported unexpected KASAN stackdepot growth:

  https://github.com/google/kasan/issues/36

... which is caused by the APIC handlers not being present in .irqentry.text:

When building with CONFIG_FUNCTION_GRAPH_TRACER=y or CONFIG_KASAN=y, put the
APIC interrupt handlers into the .irqentry.text section. This is needed
because both KASAN and function graph tracer use __irqentry_text_start and
__irqentry_text_end to determine whether a function is an IRQ entry point.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aryabinin@virtuozzo.com
Cc: kasan-dev@googlegroups.com
Cc: kcc@google.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1468575763-144889-1-git-send-email-glider@google.com
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:19:33 +02:00
Peter Zijlstra
3e2c1a67d6 perf/x86/intel: Clean up LBR state tracking
The lbr_context logic confused me; it appears to me to try and do the
same thing the pmu::sched_task() callback does now, but limited to
per-task events.

So rip it out. Afaict this should also improve performance, because I
think the current code can end up doing lbr_reset() twice, once from
the pmu::add() and then again from pmu::sched_task(), and MSR writes
(all 3*16 of them) are expensive!!

While thinking through the cases that need the reset it occured to me
the first install of an event in an active context needs to reset the
LBR (who knows what crap is in there), but detecting this case is
somewhat hard.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 13:13:27 +02:00
Peter Zijlstra
a5dcff628a perf/x86/intel: Remove redundant test from intel_pmu_lbr_add()
By the time we call pmu::add(), event->ctx must be set, and we
even already rely on this, so remove that test from
intel_pmu_lbr_add().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 13:13:26 +02:00
Peter Zijlstra
c3a61a2c5c perf/x86/intel: Eliminate dead code in intel_pmu_lbr_del()
Since pmu::del() is always called under perf_pmu_disable(), the block
conditional on cpuc->enabled is dead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 13:13:25 +02:00
Peter Zijlstra
68f7082ffb perf/x86: Ensure perf_sched_cb_{inc,dec}() is only called from pmu::{add,del}()
Currently perf_sched_cb_{inc,dec}() are called from
pmu::{start,stop}(), which has the problem that this can happen from
NMI context, this is making it hard to optimize perf_pmu_sched_task().

Furthermore, we really only need this accounting on pmu::{add,del}(),
so doing it from pmu::{start,stop}() is doing more work than we really
need.

Introduce x86_pmu::{add,del}() and wire up the LBR and PEBS.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 13:13:24 +02:00
Peter Zijlstra
09e61b4f78 perf/x86/intel: Rework the large PEBS setup code
In order to allow optimizing perf_pmu_sched_task() we must ensure
perf_sched_cb_{inc,dec}() are no longer called from NMI context; this
means that pmu::{start,stop}() can no longer use them.

Prepare for this by reworking the whole large PEBS setup code.

The current code relied on the cpuc->pebs_enabled state, however since
that reflects the current active state as per pmu::{start,stop}() we
can no longer rely on this.

Introduce two counters: cpuc->n_pebs and cpuc->n_large_pebs which
count the total number of PEBS events and the number of PEBS events
that have FREERUNNING set, resp.. With this we can tell if the current
setup requires a single record interrupt threshold or can use a larger
buffer.

This also improves the code in that it re-enables the large threshold
once the PEBS event that required single record gets removed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 13:13:24 +02:00
Nicolai Stange
6731b0d611 x86/timers/apic: Inform TSC deadline clockevent device about recalibration
This patch eliminates a source of imprecise APIC timer interrupts,
which imprecision may result in double interrupts or even late
interrupts.

The TSC deadline clockevent devices' configuration and registration
happens before the TSC frequency calibration is refined in
tsc_refine_calibration_work().

This results in the TSC clocksource and the TSC deadline clockevent
devices being configured with slightly different frequencies: the former
gets the refined one and the latter are configured with the inaccurate
frequency detected earlier by means of the "Fast TSC calibration using PIT".

Within the APIC code, introduce the notifier function
lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
clockevent devices with the current tsc_khz.

Call it from the TSC code after TSC calibration refinement has happened.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
[ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 12:38:12 +02:00
Nicolai Stange
1a9e4c564a x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
I noticed the following bug/misbehavior on certain Intel systems: with a
single task running on a NOHZ CPU on an Intel Haswell, I recognized
that I did not only get the one expected local_timer APIC interrupt, but
two per second at minimum. (!)

Further tracing showed that the first one precedes the programmed deadline
by up to ~50us and hence, it did nothing except for reprogramming the TSC
deadline clockevent device to trigger shortly thereafter again.

The reason for this is imprecise calibration, the timeout we program into
the APIC results in 'too short' timer interrupts. The core (hr)timer code
notices this (because it has a precise ktime source and sees the short
interrupt) and fixes it up by programming an additional very short
interrupt period.

This is obviously suboptimal.

The reason for the imprecise calibration is twofold, and this patch
fixes the first reason:

In setup_APIC_timer(), the registered clockevent device's frequency
is calculated by first dividing tsc_khz by TSC_DIVISOR and multiplying
it with 1000 afterwards:

  (tsc_khz / TSC_DIVISOR) * 1000

The multiplication with 1000 is done for converting from kHz to Hz and the
division by TSC_DIVISOR is carried out in order to make sure that the final
result fits into an u32.

However, with the order given in this calculation, the roundoff error
introduced by the division gets magnified by a factor of 1000 by the
following multiplication.

To fix it, reversing the order of the division and the multiplication a la:

  (tsc_khz * 1000) / TSC_DIVISOR

... reduces the roundoff error already.

Furthermore, if TSC_DIVISOR divides 1000, associativity holds:

  (tsc_khz * 1000) / TSC_DIVISOR = tsc_khz * (1000 / TSC_DIVISOR)

and thus, the roundoff error even vanishes and the whole operation can be
carried out within 32 bits.

The powers of two that divide 1000 are 2, 4 and 8. A value of 8 for
TSC_DIVISOR still allows for TSC frequencies up to
2^32 / 10^9ns * 8 = 34.4GHz which is way larger than anything to expect
in the next years.

Thus we also replace the current TSC_DIVISOR value of 32 by 8. Reverse
the order of the divison and the multiplication in the calculation of
the registered clockevent device's frequency.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-2-nicstange@gmail.com
[ Improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 12:37:38 +02:00
Linus Torvalds
1eccfa090e Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user
bounds checking for most architectures on SLAB and SLUB.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJXl9tlAAoJEIly9N/cbcAm5BoP/ikTtDp2bFw1sn92yHTnIWzl
 O+dcKVAeRgjfnSvPfb1JITpaM58exQSaDsPBeR0DbVzU1zDdhLcwHHiQupFh98Ka
 vBZthbrlL/u4NB26enEEW0iyA32BsxYBMnIu0z5ux9RbZflmQwGQ0c0rvy3dJ7/b
 FzB5ayVST5y/a0m6/sImeeExh78GU9rsMb1XmJRMwlJAy6miDz/F9TP0LnuW6PhG
 J5XC99ygNJS1pQBLACRsrZw6ImgBxXnWCok6tWPMxFfD+rJBU2//wqS+HozyMWHL
 iYP7+ytVo/ZVok4114X/V4Oof3a6wqgpBuYrivJ228QO+UsLYbYLo6sZ8kRK7VFm
 9GgHo/8rWB1T9lBbSaa7UL5r0dVNNLjFGS42vwV+YlgUMQ1A35VRojO0jUnJSIQU
 Ug1IxKmylLd0nEcwD8/l3DXeQABsfL8GsoKW0OtdTZtW4RND4gzq34LK6t7hvayF
 kUkLg1OLNdUJwOi16M/rhugwYFZIMfoxQtjkRXKWN4RZ2QgSHnx2lhqNmRGPAXBG
 uy21wlzUTfLTqTpoeOyHzJwyF2qf2y4nsziBMhvmlrUvIzW1LIrYUKCNT4HR8Sh5
 lC2WMGYuIqaiu+NOF3v6CgvKd9UW+mxMRyPEybH8mEgfm+FLZlWABiBjIUpSEZuB
 JFfuMv1zlljj/okIQRg8
 =USIR
 -----END PGP SIGNATURE-----

Merge tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull usercopy protection from Kees Cook:
 "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
  copy_from_user bounds checking for most architectures on SLAB and
  SLUB"

* tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: SLUB hardened usercopy support
  mm: SLAB hardened usercopy support
  s390/uaccess: Enable hardened usercopy
  sparc/uaccess: Enable hardened usercopy
  powerpc/uaccess: Enable hardened usercopy
  ia64/uaccess: Enable hardened usercopy
  arm64/uaccess: Enable hardened usercopy
  ARM: uaccess: Enable hardened usercopy
  x86/uaccess: Enable hardened usercopy
  mm: Hardened usercopy
  mm: Implement stack frame object validation
  mm: Add is_migrate_cma_page
2016-08-08 14:48:14 -07:00
Rafael J. Wysocki
e4630fdd47 x86/power/64: Always create temporary identity mapping correctly
The low-level resume-from-hibernation code on x86-64 uses
kernel_ident_mapping_init() to create the temoprary identity mapping,
but that function assumes that the offset between kernel virtual
addresses and physical addresses is aligned on the PGD level.

However, with a randomized identity mapping base, it may be aligned
on the PUD level and if that happens, the temporary identity mapping
created by set_up_temporary_mappings() will not reflect the actual
kernel identity mapping and the image restoration will fail as a
result (leading to a kernel panic most of the time).

To fix this problem, rework kernel_ident_mapping_init() to support
unaligned offsets between KVA and PA up to the PMD level and make
set_up_temporary_mappings() use it as approprtiate.

Reported-and-tested-by: Thomas Garnier <thgarnie@google.com>
Reported-by: Borislav Petkov <bp@suse.de>
Suggested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2016-08-08 22:04:30 +02:00
Linus Torvalds
1bd4403d86 unsafe_[get|put]_user: change interface to use a error target label
When I initially added the unsafe_[get|put]_user() helpers in commit
5b24a7a2aa ("Add 'unsafe' user access functions for batched
accesses"), I made the mistake of modeling the interface on our
traditional __[get|put]_user() functions, which return zero on success,
or -EFAULT on failure.

That interface is fairly easy to use, but it's actually fairly nasty for
good code generation, since it essentially forces the caller to check
the error value for each access.

In particular, since the error handling is already internally
implemented with an exception handler, and we already use "asm goto" for
various other things, we could fairly easily make the error cases just
jump directly to an error label instead, and avoid the need for explicit
checking after each operation.

So switch the interface to pass in an error label, rather than checking
the error value in the caller.  Best do it now before we start growing
more users (the signal handling code in particular would be a good place
to use the new interface).

So rather than

	if (unsafe_get_user(x, ptr))
		... handle error ..

the interface is now

	unsafe_get_user(x, ptr, label);

where an error during the user mode fetch will now just cause a jump to
'label' in the caller.

Right now the actual _implementation_ of this all still ends up being a
"if (err) goto label", and does not take advantage of any exception
label tricks, but for "unsafe_put_user()" in particular it should be
fairly straightforward to convert to using the exception table model.

Note that "unsafe_get_user()" is much harder to convert to a clever
exception table model, because current versions of gcc do not allow the
use of "asm goto" (for the exception) with output values (for the actual
value to be fetched).  But that is hopefully not a limitation in the
long term.

[ Also note that it might be a good idea to switch unsafe_get_user() to
  actually _return_ the value it fetches from user space, but this
  commit only changes the error handling semantics ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08 13:02:01 -07:00
Ville Syrjälä
65ea11ec6a x86/hweight: Don't clobber %rdi
The caller expects %rdi to remain intact, push+pop it make that happen.

Fixes the following kind of explosions on my core2duo machine when
trying to reboot or shut down:

  general protection fault: 0000 [#1] PREEMPT SMP
  Modules linked in: i915 i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm netconsole configfs binfmt_misc iTCO_wdt psmouse pcspkr snd_hda_codec_idt e100 coretemp hwmon snd_hda_codec_generic i2c_i801 mii i2c_smbus lpc_ich mfd_core snd_hda_intel uhci_hcd snd_hda_codec snd_hwdep snd_hda_core ehci_pci 8250 ehci_hcd snd_pcm 8250_base usbcore evdev serial_core usb_common parport_pc parport snd_timer snd soundcore
  CPU: 0 PID: 3070 Comm: reboot Not tainted 4.8.0-rc1-perf-dirty #69
  Hardware name:                  /D946GZIS, BIOS TS94610J.86A.0087.2007.1107.1049 11/07/2007
  task: ffff88012a0b4080 task.stack: ffff880123850000
  RIP: 0010:[<ffffffff81003c92>]  [<ffffffff81003c92>] x86_perf_event_update+0x52/0xc0
  RSP: 0018:ffff880123853b60  EFLAGS: 00010087
  RAX: 0000000000000001 RBX: ffff88012fc0a3c0 RCX: 000000000000001e
  RDX: 0000000000000000 RSI: 0000000040000000 RDI: ffff88012b014800
  RBP: ffff880123853b88 R08: ffffffffffffffff R09: 0000000000000000
  R10: ffffea0004a012c0 R11: ffffea0004acedc0 R12: ffffffff80000001
  R13: ffff88012b0149c0 R14: ffff88012b014800 R15: 0000000000000018
  FS:  00007f8b155cd700(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f8b155f5000 CR3: 000000012a2d7000 CR4: 00000000000006f0
  Stack:
   ffff88012fc0a3c0 ffff88012b014800 0000000000000004 0000000000000001
   ffff88012fc1b750 ffff880123853bb0 ffffffff81003d59 ffff88012b014800
   ffff88012fc0a3c0 ffff88012b014800 ffff880123853bd8 ffffffff81003e13
  Call Trace:
   [<ffffffff81003d59>] x86_pmu_stop+0x59/0xd0
   [<ffffffff81003e13>] x86_pmu_del+0x43/0x140
   [<ffffffff8111705d>] event_sched_out.isra.105+0xbd/0x260
   [<ffffffff8111738d>] __perf_remove_from_context+0x2d/0xb0
   [<ffffffff8111745d>] __perf_event_exit_context+0x4d/0x70
   [<ffffffff810c8826>] generic_exec_single+0xb6/0x140
   [<ffffffff81117410>] ? __perf_remove_from_context+0xb0/0xb0
   [<ffffffff81117410>] ? __perf_remove_from_context+0xb0/0xb0
   [<ffffffff810c898f>] smp_call_function_single+0xdf/0x140
   [<ffffffff81113d27>] perf_event_exit_cpu_context+0x87/0xc0
   [<ffffffff81113d73>] perf_reboot+0x13/0x40
   [<ffffffff8107578a>] notifier_call_chain+0x4a/0x70
   [<ffffffff81075ad7>] __blocking_notifier_call_chain+0x47/0x60
   [<ffffffff81075b06>] blocking_notifier_call_chain+0x16/0x20
   [<ffffffff81076a1d>] kernel_restart_prepare+0x1d/0x40
   [<ffffffff81076ae2>] kernel_restart+0x12/0x60
   [<ffffffff81076d56>] SYSC_reboot+0xf6/0x1b0
   [<ffffffff811a823c>] ? mntput_no_expire+0x2c/0x1b0
   [<ffffffff811a83e4>] ? mntput+0x24/0x40
   [<ffffffff811894fc>] ? __fput+0x16c/0x1e0
   [<ffffffff811895ae>] ? ____fput+0xe/0x10
   [<ffffffff81072fc3>] ? task_work_run+0x83/0xa0
   [<ffffffff81001623>] ? exit_to_usermode_loop+0x53/0xc0
   [<ffffffff8100105a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
   [<ffffffff81076e6e>] SyS_reboot+0xe/0x10
   [<ffffffff814c4ba5>] entry_SYSCALL_64_fastpath+0x18/0xa3
  Code: 7c 4c 8d af c0 01 00 00 49 89 fe eb 10 48 09 c2 4c 89 e0 49 0f b1 55 00 4c 39 e0 74 35 4d 8b a6 c0 01 00 00 41 8b 8e 60 01 00 00 <0f> 33 8b 35 6e 02 8c 00 48 c1 e2 20 85 f6 7e d2 48 89 d3 89 cf
  RIP  [<ffffffff81003c92>] x86_perf_event_update+0x52/0xc0
   RSP <ffff880123853b60>
  ---[ end trace 7ec95181faf211be ]---
  note: reboot[3070] exited with preempt_count 2

Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Fixes: f5967101e9 ("x86/hweight: Get rid of the special calling convention")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08 10:58:25 -07:00
Al Viro
784d5699ed x86: move exports to actual definitions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-07 23:47:15 -04:00
Linus Torvalds
80fac0f577 * ARM bugfix and MSI injection support
* x86 nested virt tweak and OOPS fix
 * Simplify pvclock code (vdso bits acked by Andy Lutomirski).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXpKOnAAoJEL/70l94x66D5z4H/R2660Vy3brQrI8lGxCtkXJt
 AVe8PwI8nDfYJ/UkMZ2KcHPSvy+sHW2ydaZXYNqXHVBeTaUxiPW9rTgK61ebypGL
 1tPOgJ3kGZF6XEdAz6gS8LniNFc+D3W6Y6sRylkEsqPj39/hxe7QMoOMSCQ9imbW
 WMIx7/81i1EMw6oi+9FVtq+yHCpvyfFnD8t1TDsYWOReVn1J15SxbEs4Ih+hBMLz
 HZ5DEjp9cAmzeR7GLje5eH1t6TEEoNb1MNgFWuscoAsDf8D9DKqRB9s0hC+TLFYn
 oZbGSqjQwu3/VMblgedinH6X9MTm8V0zW29ToGnDcoO00AUmdlNmXSaZUhvT/Rs=
 =H5cD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 - ARM bugfix and MSI injection support
 - x86 nested virt tweak and OOPS fix
 - Simplify pvclock code (vdso bits acked by Andy Lutomirski).

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  nvmx: mark ept single context invalidation as supported
  nvmx: remove comment about missing nested vpid support
  KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off
  KVM: documentation: fix KVM_CAP_X2APIC_API information
  x86: vdso: use __pvclock_read_cycles
  pvclock: introduce seqcount-like API
  arm64: KVM: Set cpsr before spsr on fault injection
  KVM: arm: vgic-irqfd: Workaround changing kvm_set_routing_entry prototype
  KVM: arm/arm64: Enable MSI routing
  KVM: arm/arm64: Enable irqchip routing
  KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header
  KVM: irqchip: Convey devid to kvm_set_msi
  KVM: Add devid in kvm_kernel_irq_routing_entry
  KVM: api: Pass the devid in the msi routing entry
2016-08-06 09:18:21 -04:00
Linus Torvalds
db8262787e Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Mostly tooling fixes and some late tooling updates, plus two perf
  related printk message fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tests bpf: Use SyS_epoll_wait alias
  perf tests: objdump output can contain multi byte chunks
  perf record: Add --sample-cpu option
  perf hists: Introduce output_resort_cb method
  perf tools: Move config/Makefile into Makefile.config
  perf tests: Add test for bitmap_scnprintf function
  tools lib: Add bitmap_and function
  tools lib: Add bitmap_scnprintf function
  tools lib: Add bitmap_alloc function
  tools lib traceevent: Ignore generated library files
  perf tools: Fix build failure on perl script context
  perf/core: Change log level for duration warning to KERN_INFO
  perf annotate: Plug filename string leak
  perf annotate: Introduce strerror for handling symbol__disassemble() errors
  perf annotate: Rename symbol__annotate() to symbol__disassemble()
  perf/x86: Modify error message in virtualized environment
  perf target: str_error_r() always returns the buffer it receives
  perf annotate: Use pipe + fork instead of popen
  perf evsel: Introduce constructor for cycles event
2016-08-06 09:10:36 -04:00
Linus Torvalds
c98f5827f8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two fixes and a cleanup-fix, to the syscall entry code and to ptrace"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace
  x86/ptrace: Stop setting TS_COMPAT in ptrace code
  x86/vdso: Error out if the vDSO isn't a valid DSO
2016-08-06 09:04:35 -04:00
Linus Torvalds
11d8ec408d More power management updates for v4.8-rc1
- Prevent the low-level assembly hibernate code on x86-64 from
    referring to __PAGE_OFFSET directly as a symbol which doesn't work
    when the kernel identity mapping base is randomized, in which case
    __PAGE_OFFSET is a variable (Rafael Wysocki).
 
  - Avoid selecting CPU_FREQ_STAT by default as the statistics are not
    required for proper cpufreq operation (Borislav Petkov).
 
  - Add Skylake-X and Broadwell-X IDs to the intel_pstate's list of
    processors where out-of-band (OBB) control of P-states is possible
    and if that is in use, intel_pstate should not attempt to manage
    P-states (Srinivas Pandruvada).
 
  - Drop some unnecessary checks from the wakeup IRQ handling code in
    the PM core (Markus Elfring).
 
  - Reduce the number operating performance point (OPP) lookups in
    one of the OPP framework's helper functions (Jisheng Zhang).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXpKLNAAoJEILEb/54YlRx6L4P/39GR0kVB7vIyCajdWc4f3gh
 0zh5RbKC0YT4F6upebzgQ9uYS9cv4y+df7ShVwKQAea3wDReEmZhM/egOGw0Ls+8
 SS1MiJq1LSekyMIWH6cXwZsH69/V0LuWTBbzWYBgHUDbfEMlgwV5ZZMEH14/2bWw
 d4SLUUiW5P42im+IDAxpdYneKOrbJo3txj6WbOutgtIrHdPko6lF1dDouKvI1QTk
 zCBOkEB9nELq3rWN/sbPmHzbmbj/yFiiHk5+iqwuKKJZI8PQB9/C6Qmc3wvjtPpc
 GXLPI+OHqLgBMofGsiKOvm3hPQAIjf/ERsUilHLE3qOi/Mi0qj7U4dFivZrPWaCG
 j2bD+b36TffmG1r8L7NYbEKU60syeIFSRqAngbyswu6XF+NdVboaifENGcgM3tBC
 pUC1mFh/4PMKP5zW9mOwE6WSntZkw14CVR+A3fRFOuTBavNvjGdckwLi/aBsZdU3
 K4DJUFzdELF7+JdqnQV35yV2tgMbJhxQa/QBykiFBh3AyqliOZ8uBoIxxLinrGmH
 XWR3kB4ZPRBIStGI9IpG3lNhLLU7mLIdaFhGayicicAwFcLsXN7oKWVzYfUqWA9T
 1ptXApcRf6S9J1JnKHkznoQ1/D1pNYLbH+t7BOtWdK8SWBhMS2Lzo2kyKWsCFkJS
 16J8j1tcLDhN1dvcBT55
 =WvPB
 -----END PGP SIGNATURE-----

Merge tag 'pm-extra-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "A few more fixes and cleanups in the x86-64 low-level hibernation
  code, PM core, cpufreq (Kconfig and intel_pstate), and the operating
  points framework.

  Specifics:

   - Prevent the low-level assembly hibernate code on x86-64 from
     referring to __PAGE_OFFSET directly as a symbol which doesn't work
     when the kernel identity mapping base is randomized, in which case
     __PAGE_OFFSET is a variable (Rafael Wysocki).

   - Avoid selecting CPU_FREQ_STAT by default as the statistics are not
     required for proper cpufreq operation (Borislav Petkov).

   - Add Skylake-X and Broadwell-X IDs to the intel_pstate's list of
     processors where out-of-band (OBB) control of P-states is possible
     and if that is in use, intel_pstate should not attempt to manage
     P-states (Srinivas Pandruvada).

   - Drop some unnecessary checks from the wakeup IRQ handling code in
     the PM core (Markus Elfring).

   - Reduce the number operating performance point (OPP) lookups in one
     of the OPP framework's helper functions (Jisheng Zhang)"

* tag 'pm-extra-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  x86/power/64: Do not refer to __PAGE_OFFSET from assembly code
  cpufreq: Do not default-yes CPU_FREQ_STAT
  cpufreq: intel_pstate: Add more out-of-band IDs
  PM / OPP: optimize dev_pm_opp_set_rate() performance a bit
  PM-wakeup: Delete unnecessary checks before three function calls
2016-08-05 23:26:16 -04:00
Linus Torvalds
6c84239d59 RTC for 4.8
Cleanups:
  - huge cleanup of rtc-generic and char/genrtc this allowed to cleanup rtc-cmos,
   rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
  - move mn10300 to rtc-cmos
 
 Subsystem:
  - fix wakealarms after hibernate
  - multiples fixes for rctest
  - simplify implementations of .read_alarm
 
 New drivers:
  - Maxim MAX6916
 
 Drivers:
  - ds1307: fix weekday
  - m41t80: add wakeup support
  - pcf85063: add support for PCF85063A variant
  - rv8803: extend i2c fix and other fixes
  - s35390a: fix alarm reading, this fixes instant reboot after shutdown for QNAP
    TS-41x
  - s3c: clock fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXokhIAAoJENiigzvaE+LCZqQP+wWzintN/N1u3dKiVB7iSdwq
 +S/jAXD9wW8OK9PI60/YUGRYeUXmZW9t4XYg1VKCxU9KpVC17LgOtDyXD8BufP1V
 uREJEzZw9O7zCCjeHp/ICFjBkc62Net6ZDOO+ZyXPNfddpS1Xq1uUgXLZc/202UR
 ID/kewu0pJRDnoxyqznWn9+8D33w/ygXs2slY2Ive0ONtjdgxGcsj2rNbb2RYn2z
 OP7br3lLg7qkFh4TtXb61eh/9GYIk6wzP/CrX5l/jH4SjQnrIk5g/X/Cd1qQ/qso
 JZzFoonOKvIp5Gw/+fZ9NP3YFcnkoRMv4NjZV8PAmsYLds+ibRiBcoB8u6FmiJV7
 WW5uopgPkfCGN5BV3+QHwJDVe+WlgnlzaT5zPUCcP5KWusDts4fWIgzP7vrtAzf4
 3OJLrgSGdBeOqWnJD21nxKUD27JOseX7D+BFtwxR4lMsXHqlHJfETpZ8gts1ZGH3
 2U353j/jkZvGWmc6dMcuxOXT2K4VqpYeIIqs0IcLu6hM9crtR89zPR2Iu1AilfDW
 h2NroF+Q//SgMMzWoTEG6Tn7RAc7MthgA/tRCFZF9CBMzNs988w0CTHnKsIHmjpU
 UKkMeJGAC9YrPYIcqrg0oYsmLUWXc8JuZbGJBnei3BzbaMTlcwIN9qj36zfq6xWc
 TMLpbWEoIsgFIZMP/hAP
 =rpGB
 -----END PGP SIGNATURE-----

Merge tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "RTC for 4.8

  Cleanups:
   - huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
     rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
   - move mn10300 to rtc-cmos

  Subsystem:
   - fix wakealarms after hibernate
   - multiples fixes for rctest
   - simplify implementations of .read_alarm

  New drivers:
   - Maxim MAX6916

  Drivers:
   - ds1307: fix weekday
   - m41t80: add wakeup support
   - pcf85063: add support for PCF85063A variant
   - rv8803: extend i2c fix and other fixes
   - s35390a: fix alarm reading, this fixes instant reboot after
     shutdown for QNAP TS-41x
   - s3c: clock fixes"

* tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
  rtc: rv8803: Clear V1F when setting the time
  rtc: rv8803: Stop the clock while setting the time
  rtc: rv8803: Always apply the I²C workaround
  rtc: rv8803: Fix read day of week
  rtc: rv8803: Remove the check for valid time
  rtc: rv8803: Kconfig: Indicate rx8900 support
  rtc: asm9260: remove .owner field for driver
  rtc: at91sam9: Fix missing spin_lock_init()
  rtc: m41t80: add suspend handlers for alarm IRQ
  rtc: m41t80: make it a real error message
  rtc: pcf85063: Add support for the PCF85063A device
  rtc: pcf85063: fix year range
  rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
  rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
  rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
  rtc: s3c: Remove unnecessary call to disable already disabled clock
  rtc: abx80x: use devm_add_action_or_reset()
  rtc: m41t80: use devm_add_action_or_reset()
  rtc: fix a typo and reduce three empty lines to one
  rtc: s35390a: improve two comments in .set_alarm
  ...
2016-08-05 09:48:22 -04:00
Rafael J. Wysocki
e2b3b80de5 Merge branches 'pm-sleep', 'pm-cpufreq', 'pm-core' and 'pm-opp'
* pm-sleep:
  x86/power/64: Do not refer to __PAGE_OFFSET from assembly code

* pm-cpufreq:
  cpufreq: Do not default-yes CPU_FREQ_STAT
  cpufreq: intel_pstate: Add more out-of-band IDs

* pm-core:
  PM-wakeup: Delete unnecessary checks before three function calls

* pm-opp:
  PM / OPP: optimize dev_pm_opp_set_rate() performance a bit
2016-08-05 15:46:55 +02:00
Linus Torvalds
9e0243db61 Merge branch 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
 "Beside of various fixes this also contains patches to enable features
  such was Kcov, kmemleak and TRACE_IRQFLAGS_SUPPORT on UML"

* 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
  um: Support kcov
  um: Enable TRACE_IRQFLAGS_SUPPORT
  um: Use asm-generic/irqflags.h
  um: Fix possible deadlock in sig_handler_common()
  um: Select HAVE_DEBUG_KMEMLEAK
  um: Setup physical memory in setup_arch()
  um: Eliminate null test after alloc_bootmem
2016-08-04 19:37:59 -04:00
Krzysztof Kozlowski
00085f1efa dma-mapping: use unsigned long for dma_attrs
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Masahiro Yamada
97f2645f35 tree-wide: replace config_enabled() with IS_ENABLED()
The use of config_enabled() against config options is ambiguous.  In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED().  Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
clearer.

This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.

I noticed two cases where config_enabled() is used against a tristate
option:

 - config_enabled(CONFIG_HWMON)
  [ drivers/net/wireless/ath/ath10k/thermal.c ]

 - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
  [ drivers/gpu/drm/gma500/opregion.c ]

I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.

Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Joshua Kinard <kumba@gentoo.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will Drewry <wad@chromium.org>
Cc: Nikolay Martynov <mar.kolya@gmail.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: Rafal Milecki <zajec5@gmail.com>
Cc: James Cowgill <James.Cowgill@imgtec.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tony Wu <tung7970@gmail.com>
Cc: Huaitong Han <huaitong.han@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Rabin Vincent <rabin@rab.in>
Cc: "Maciej W. Rozycki" <macro@imgtec.com>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Bandan Das
45e11817d5 nvmx: mark ept single context invalidation as supported
Commit 4b85507860 ("KVM: nVMX: Don't advertise single
context invalidation for invept") removed advertising
single context invalidation since the spec does not mandate it.
However, some hypervisors (such as ESX) require it to be present
before willing to use ept in a nested environment. Advertise it
and fallback to the global case.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-04 14:21:52 +02:00
Bandan Das
03331b4b8b nvmx: remove comment about missing nested vpid support
Nested vpid is already supported and both single/global
modes are advertised to the guest

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-04 14:21:51 +02:00
Wanpeng Li
910053002e KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off
BUG: unable to handle kernel NULL pointer dereference at 000000000000008c
IP: [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
PGD 0
Oops: 0000 [#1] SMP
Call Trace:
 kvm_arch_vcpu_load+0x86/0x260 [kvm]
 vcpu_load+0x46/0x60 [kvm]
 kvm_vcpu_ioctl+0x79/0x7c0 [kvm]
 ? __lock_is_held+0x54/0x70
 do_vfs_ioctl+0x96/0x6a0
 ? __fget_light+0x2a/0x90
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x7c/0x1e0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP  [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
 RSP <ffff8800db1f3d70>
CR2: 000000000000008c
---[ end trace a55fb79d2b3b4ee8 ]---

This can be reproduced steadily by kernel_irqchip=off.

We should not access preemption timer stuff if lapic is emulated in userspace.
This patch fix it by avoiding access preemption timer stuff when kernel_irqchip=off.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-04 14:21:43 +02:00
Paolo Bonzini
6f49b2f341 KVM/ARM Changes for v4.8 - Take 2
Includes GSI routing support to go along with the new VGIC and a small fix that
 has been cooking in -next for a while.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXoydqAAoJEEtpOizt6ddyM3oH/1A4VeG/J9q4fBPXqY2tVWXs
 c3P7UgNcrEgUNs/F9ykQY/lb31deecUzaBt1OyTf+RlsNbihq3dQdYcBhxtUODw/
 Faok582ya3UFgLW+IRHcID0EbkVOpIzMhOStYsnU/Dz7HG1JL9HdPzwkid7iu9LT
 fI6yrrBnJFjdWAAQ4BkcEKBENRsY8NTs7jX5vnFA92MkUBby7BmariPDD3FtrB+f
 Ob9B7CxM30pNqsN7OA/QvFOHMJHxf3s1TBKwmPHe5TLIfSzV1YxcEGiMc0lWqF4v
 BT8ZeMGCtjDw94tND1DskfQQRPaMqPmGuRTrAW/IuE2n92bFtbqIqs7Cbw0fzLE=
 =Vm6Q
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.8-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.8 - Take 2

Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.
2016-08-04 13:59:56 +02:00
Paolo Bonzini
abe9efa79b x86: vdso: use __pvclock_read_cycles
The new simplified __pvclock_read_cycles does the same computation
as vread_pvclock, except that (because it takes the pvclock_vcpu_time_info
pointer) it has to be moved inside the loop.  Since the loop is expected to
never roll, this makes no difference.

Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-04 13:52:21 +02:00
Paolo Bonzini
3aed64f6d3 pvclock: introduce seqcount-like API
The version field in struct pvclock_vcpu_time_info basically implements
a seqcount.  Wrap it with the usual read_begin and read_retry functions,
and use these APIs instead of peppering the code with smp_rmb()s.
While at it, change it to the more pedantically correct virt_rmb().

With this change, __pvclock_read_cycles can be simplified noticeably.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-04 13:52:21 +02:00
Vegard Nossum
915eed20e4 um: Support kcov
This adds support for kcov to UML.

There is a small problem where UML will randomly segfault during boot;
this is because current_thread_info() occasionally returns an invalid
(non-NULL) pointer and we try to dereference it in
__sanitizer_cov_trace_pc(). I consider this a bug in UML itself and this
patch merely exposes it.

[v2: disable instrumentation in UML-specific code]

Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: user-mode-linux-devel <user-mode-linux-devel@lists.sourceforge.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-08-04 00:18:06 +02:00
Linus Torvalds
d52bd54db8 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - the rest of ocfs2

 - various hotfixes, mainly MM

 - quite a bit of misc stuff - drivers, fork, exec, signals, etc.

 - printk updates

 - firmware

 - checkpatch

 - nilfs2

 - more kexec stuff than usual

 - rapidio updates

 - w1 things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
  ipc: delete "nr_ipc_ns"
  kcov: allow more fine-grained coverage instrumentation
  init/Kconfig: add clarification for out-of-tree modules
  config: add android config fragments
  init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
  relay: add global mode support for buffer-only channels
  init: allow blacklisting of module_init functions
  w1:omap_hdq: fix regression
  w1: add helper macro module_w1_family
  w1: remove need for ida and use PLATFORM_DEVID_AUTO
  rapidio/switches: add driver for IDT gen3 switches
  powerpc/fsl_rio: apply changes for RIO spec rev 3
  rapidio: modify for rev.3 specification changes
  rapidio: change inbound window size type to u64
  rapidio/idt_gen2: fix locking warning
  rapidio: fix error handling in mbox request/release functions
  rapidio/tsi721_dma: advance queue processing from transfer submit call
  rapidio/tsi721: add messaging mbox selector parameter
  rapidio/tsi721: add PCIe MRRS override parameter
  rapidio/tsi721_dma: add channel mask and queue size parameters
  ...
2016-08-02 21:08:07 -04:00
Rafael J. Wysocki
c226fab474 x86/power/64: Do not refer to __PAGE_OFFSET from assembly code
When CONFIG_RANDOMIZE_MEMORY is set on x86-64, __PAGE_OFFSET becomes
a variable and using it as a symbol in the image memory restoration
assembly code under core_restore_code is not correct any more.

To avoid that problem, modify set_up_temporary_mappings() to compute
the physical address of the temporary page tables and store it in
temp_level4_pgt, so that the value of that variable is ready to be
written into CR3.  Then, the assembly code doesn't have to worry
about converting that value into a physical address and things work
regardless of whether or not CONFIG_RANDOMIZE_MEMORY is set.

Reported-and-tested-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-08-03 01:35:38 +02:00
Petr Tesarik
c025311596 kexec: allow kdump with crash_kexec_post_notifiers
If a crash kernel is loaded, do not crash the running domain.  This is
needed if the kernel is loaded with crash_kexec_post_notifiers, because
panic notifiers are run before __crash_kexec() in that case, and this
Xen hook prevents its being called later.

[akpm@linux-foundation.org: build fix: unconditionally include kexec.h]
Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:30 -04:00
Andy Lutomirski
7e7814180b signal: consolidate {TS,TLF}_RESTORE_SIGMASK code
In general, there's no need for the "restore sigmask" flag to live in
ti->flags.  alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.

Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.

Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.

Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:23 -04:00
Fabian Frederick
bd721ea73e treewide: replace obsolete _refok by __ref
There was only one use of __initdata_refok and __exit_refok

__init_refok was used 46 times against 82 for __ref.

Those definitions are obsolete since commit 312b1485fb ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")

This patch removes the following compatibility definitions and replaces
them treewide.

/* compatibility defines */
#define __init_refok     __ref
#define __initdata_refok __refdata
#define __exit_refok     __ref

I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 17:31:41 -04:00
Linus Torvalds
c8d0267efd PCI changes for the v4.8 merge window:
Enumeration
     Move ecam.h to linux/include/pci-ecam.h (Jayachandran C)
     Add parent device field to ECAM struct pci_config_window (Jayachandran C)
     Add generic MCFG table handling (Tomasz Nowicki)
     Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki)
     Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki)
 
   Resource management
     Add devm_request_pci_bus_resources() (Bjorn Helgaas)
     Unify pci_resource_to_user() declarations (Bjorn Helgaas)
     Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas)
     Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas)
     Make PCI I/O space optional on ARM32 (Bjorn Helgaas)
     Ignore write combining when mapping I/O port space (Bjorn Helgaas)
     Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas)
     Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas)
     Support I/O resources when parsing host bridge resources (Jayachandran C)
     Add helpers to request/release memory and I/O regions (Johannes Thumshirn)
     Use pci_(request|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn)
     Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5))
     Add generic pci_bus_claim_resources() (Lorenzo Pieralisi)
     Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
     Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi)
     Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya)
     Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu)
 
   PCI device hotplug
     Allow additional bus numbers for hotplug bridges (Keith Busch)
     Ignore interrupts during D3cold (Lukas Wunner)
 
   Power management
     Enforce type casting for pci_power_t (Andy Shevchenko)
     Don't clear d3cold_allowed for PCIe ports (Mika Westerberg)
     Put PCIe ports into D3 during suspend (Mika Westerberg)
     Power on bridges before scanning new devices (Mika Westerberg)
     Runtime resume bridge before rescan (Mika Westerberg)
     Add runtime PM support for PCIe ports (Mika Westerberg)
     Remove redundant check of pcie_set_clkpm (Shawn Lin)
 
   Virtualization
     Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra)
     Add DMA alias quirk for Adaptec 3805 (Alex Williamson)
     Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake)
     Add ACS quirk for Solarflare SFC9220 (Edward Cree)
 
   MSI
     Fix PCI_MSI dependencies (Arnd Bergmann)
     Add pci_msix_desc_addr() helper (Christoph Hellwig)
     Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig)
     Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig)
     Provide sensible IRQ vector alloc/free routines (Christoph Hellwig)
     Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig)
 
   Error Handling
     Bind DPC to Root Ports as well as Downstream Ports (Keith Busch)
     Remove DPC tristate module option (Keith Busch)
     Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg)
 
   Generic host bridge driver
     Select IRQ_DOMAIN (Arnd Bergmann)
     Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
 
   ACPI host bridge driver
     Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki)
     Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki)
     Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki)
     Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki)
 
   Altera host bridge driver
     Check link status before retrain link (Ley Foon Tan)
     Poll for link up status after retraining the link (Ley Foon Tan)
 
   Axis ARTPEC-6 host bridge driver
     Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann)
     Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel)
     Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel)
 
   Intel VMD host bridge driver
     Use lock save/restore in interrupt enable path (Jon Derrick)
     Select device dma ops to override (Keith Busch)
     Initialize list item in IRQ disable (Keith Busch)
     Use x86_vector_domain as parent domain (Keith Busch)
     Separate MSI and MSI-X vector sharing (Keith Busch)
 
   Marvell Aardvark host bridge driver
     Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni)
     Add Aardvark PCI host controller driver (Thomas Petazzoni)
     Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni)
 
   Microsoft Hyper-V host bridge driver
     Fix interrupt cleanup path (Cathy Avery)
     Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
     Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
 
   NVIDIA Tegra host bridge driver
     Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren)
     Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren)
     Use lower-case hex consistently for register definitions (Thierry Reding)
     Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding)
     Stop setting pcibios_min_mem (Thierry Reding)
 
   Renesas R-Car host bridge driver
     Drop gen2 dummy I/O port region (Bjorn Helgaas)
 
   TI DRA7xx host bridge driver
     Fix return value in case of error (Christophe JAILLET)
 
   Xilinx AXI host bridge driver
     Fix return value in case of error (Christophe JAILLET)
 
   Miscellaneous
     Make bus_attr_resource_alignment static (Ben Dooks)
     Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks)
     MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven)
     Make host bridge drivers explicitly non-modular (Paul Gortmaker)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXoNRtAAoJEFmIoMA60/r8LMkP/3kiNh21QFS6RZGOaDft5/Py
 n14Zo0w51avspxoI3iyDlBd5q/SssMqi+2c6Ko/fh2D2xMxJgmQOjdMDrIGARxGA
 qEHk/5IoXquY2/GcptmCk3ap66cJ6kTovS4OPrb73m3fPuknFwFwdzExq22XHbnI
 crPya6xwQxPLc54VpY/TsgW8E+EKZd/3FW9wuzzNHXrXmTILyhBQzQAA0K470GMx
 wEXU6kc3M/XhRuF1zjV9/O+H/xguwfnbTpZLvd2NAF6uXKZoRytEHHtNnVqu1hoe
 UPpDS2xq32pMNbGxGqBetCdIbkY/hWOufmckHI7Yu2OfXBYyHBYMG2je1+nMPkOV
 WiFhhrchGt5KnEMUwXPS4ROqnSZVpZBl1Fd4s10GhUYkoE2HNKJXta398H9FR1jj
 4NEVSi4mSX/+CkaoIN3lXYiaf9P0wv4Wppve4Scr30+VnLjJhm7Vw5La7v12oo6x
 otrJ/g98AkmnbuUdLeWBUS/+TOcdPjZYbw52rqBsbOOjFm51Zcj6D7kf5WcTypQy
 HzbvygSVabcioWehUG1uudC8pdJmQlUGx1aES/iu+mZEae4cuUFALu6hDBD9IYnZ
 5JdwjVzI0UItEwT3rQt3t4xiAqHADQ0NAVNJVCeREdoy/YQpSoTWGXIpyqCZ1yCm
 aBykjRsxbKQXlhVeIxuc
 =NVxu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Highlights:

   - ARM64 support for ACPI host bridges

   - new drivers for Axis ARTPEC-6 and Marvell Aardvark

   - new pci_alloc_irq_vectors() interface for MSI-X, MSI, legacy INTx

   - pci_resource_to_user() cleanup (more to come)

  Detailed summary:

  Enumeration:
   - Move ecam.h to linux/include/pci-ecam.h (Jayachandran C)
   - Add parent device field to ECAM struct pci_config_window (Jayachandran C)
   - Add generic MCFG table handling (Tomasz Nowicki)
   - Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki)
   - Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki)

  Resource management:
   - Add devm_request_pci_bus_resources() (Bjorn Helgaas)
   - Unify pci_resource_to_user() declarations (Bjorn Helgaas)
   - Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas)
   - Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas)
   - Make PCI I/O space optional on ARM32 (Bjorn Helgaas)
   - Ignore write combining when mapping I/O port space (Bjorn Helgaas)
   - Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas)
   - Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas)
   - Support I/O resources when parsing host bridge resources (Jayachandran C)
   - Add helpers to request/release memory and I/O regions (Johannes Thumshirn)
   - Use pci_(request|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn)
   - Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5))
   - Add generic pci_bus_claim_resources() (Lorenzo Pieralisi)
   - Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
   - Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi)
   - Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya)
   - Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu)

  PCI device hotplug:
   - Allow additional bus numbers for hotplug bridges (Keith Busch)
   - Ignore interrupts during D3cold (Lukas Wunner)

  Power management:
   - Enforce type casting for pci_power_t (Andy Shevchenko)
   - Don't clear d3cold_allowed for PCIe ports (Mika Westerberg)
   - Put PCIe ports into D3 during suspend (Mika Westerberg)
   - Power on bridges before scanning new devices (Mika Westerberg)
   - Runtime resume bridge before rescan (Mika Westerberg)
   - Add runtime PM support for PCIe ports (Mika Westerberg)
   - Remove redundant check of pcie_set_clkpm (Shawn Lin)

  Virtualization:
   - Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra)
   - Add DMA alias quirk for Adaptec 3805 (Alex Williamson)
   - Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake)
   - Add ACS quirk for Solarflare SFC9220 (Edward Cree)

  MSI:
   - Fix PCI_MSI dependencies (Arnd Bergmann)
   - Add pci_msix_desc_addr() helper (Christoph Hellwig)
   - Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig)
   - Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig)
   - Provide sensible IRQ vector alloc/free routines (Christoph Hellwig)
   - Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig)

  Error Handling:
   - Bind DPC to Root Ports as well as Downstream Ports (Keith Busch)
   - Remove DPC tristate module option (Keith Busch)
   - Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg)

  Generic host bridge driver:
   - Select IRQ_DOMAIN (Arnd Bergmann)
   - Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)

  ACPI host bridge driver:
   - Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki)
   - Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki)
   - Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki)
   - Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki)

  Altera host bridge driver:
   - Check link status before retrain link (Ley Foon Tan)
   - Poll for link up status after retraining the link (Ley Foon Tan)

  Axis ARTPEC-6 host bridge driver:
   - Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann)
   - Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel)
   - Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel)

  Intel VMD host bridge driver:
   - Use lock save/restore in interrupt enable path (Jon Derrick)
   - Select device dma ops to override (Keith Busch)
   - Initialize list item in IRQ disable (Keith Busch)
   - Use x86_vector_domain as parent domain (Keith Busch)
   - Separate MSI and MSI-X vector sharing (Keith Busch)

  Marvell Aardvark host bridge driver:
   - Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni)
   - Add Aardvark PCI host controller driver (Thomas Petazzoni)
   - Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni)

  Microsoft Hyper-V host bridge driver:
   - Fix interrupt cleanup path (Cathy Avery)
   - Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
   - Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov)

  NVIDIA Tegra host bridge driver:
   - Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren)
   - Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren)
   - Use lower-case hex consistently for register definitions (Thierry Reding)
   - Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding)
   - Stop setting pcibios_min_mem (Thierry Reding)

  Renesas R-Car host bridge driver:
   - Drop gen2 dummy I/O port region (Bjorn Helgaas)

  TI DRA7xx host bridge driver:
   - Fix return value in case of error (Christophe JAILLET)

  Xilinx AXI host bridge driver:
   - Fix return value in case of error (Christophe JAILLET)

  Miscellaneous:
   - Make bus_attr_resource_alignment static (Ben Dooks)
   - Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks)
   - MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven)
   - Make host bridge drivers explicitly non-modular (Paul Gortmaker)"

* tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (125 commits)
  PCI: xgene: Make explicitly non-modular
  PCI: thunder-pem: Make explicitly non-modular
  PCI: thunder-ecam: Make explicitly non-modular
  PCI: tegra: Make explicitly non-modular
  PCI: rcar-gen2: Make explicitly non-modular
  PCI: rcar: Make explicitly non-modular
  PCI: mvebu: Make explicitly non-modular
  PCI: layerscape: Make explicitly non-modular
  PCI: keystone: Make explicitly non-modular
  PCI: hisi: Make explicitly non-modular
  PCI: generic: Make explicitly non-modular
  PCI: designware-plat: Make it explicitly non-modular
  PCI: artpec6: Make explicitly non-modular
  PCI: armada8k: Make explicitly non-modular
  PCI: artpec: Add PCI_MSI_IRQ_DOMAIN dependency
  PCI: Add ACS quirk for Solarflare SFC9220
  arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700
  PCI: aardvark: Add Aardvark PCI host controller driver
  dt-bindings: add DT binding for the Aardvark PCIe controller
  PCI: tegra: Program PADS_REFCLK_CFG* registers with per-SoC values
  ...
2016-08-02 17:12:29 -04:00
Linus Torvalds
f716a85cd6 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - GCC plugin support by Emese Revfy from grsecurity, with a fixup from
   Kees Cook.  The plugins are meant to be used for static analysis of
   the kernel code.  Two plugins are provided already.

 - reduction of the gcc commandline by Arnd Bergmann.

 - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada

 - bin2c fix by Michael Tautschnig

 - setlocalversion fix by Wolfram Sang

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  gcc-plugins: disable under COMPILE_TEST
  kbuild: Abort build on bad stack protector flag
  scripts: Fix size mismatch of kexec_purgatory_size
  kbuild: make samples depend on headers_install
  Kbuild: don't add obj tree in additional includes
  Kbuild: arch: look for generated headers in obtree
  Kbuild: always prefix objtree in LINUXINCLUDE
  Kbuild: avoid duplicate include path
  Kbuild: don't add ../../ to include path
  vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
  kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
  kconfig.h: use already defined macros for IS_REACHABLE() define
  export.h: use __is_defined() to check if __KSYM_* is defined
  kconfig.h: use __is_defined() to check if MODULE is defined
  kbuild: setlocalversion: print error to STDERR
  Add sancov plugin
  Add Cyclomatic complexity GCC plugin
  GCC plugin infrastructure
  Shared library support
2016-08-02 16:37:12 -04:00
Linus Torvalds
221bb8a46e - ARM: GICv3 ITS emulation and various fixes. Removal of the old
VGIC implementation.
 
 - s390: support for trapping software breakpoints, nested virtualization
 (vSIE), the STHYI opcode, initial extensions for CPU model support.
 
 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups,
 preliminary to this and the upcoming support for hardware virtualization
 extensions.
 
 - x86: support for execute-only mappings in nested EPT; reduced vmexit
 latency for TSC deadline timer (by about 30%) on Intel hosts; support for
 more than 255 vCPUs.
 
 - PPC: bugfixes.
 
 The ugly bit is the conflicts.  A couple of them are simple conflicts due
 to 4.7 fixes, but most of them are with other trees. There was definitely
 too much reliance on Acked-by here.  Some conflicts are for KVM patches
 where _I_ gave my Acked-by, but the worst are for this pull request's
 patches that touch files outside arch/*/kvm.  KVM submaintainers should
 probably learn to synchronize better with arch maintainers, with the
 latter providing topic branches whenever possible instead of Acked-by.
 This is what we do with arch/x86.  And I should learn to refuse pull
 requests when linux-next sends scary signals, even if that means that
 submaintainers have to rebase their branches.
 
 Anyhow, here's the list:
 
 - arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed
 by the nvdimm tree.  This tree adds handle_preemption_timer and
 EXIT_REASON_PREEMPTION_TIMER at the same place.  In general all mentions
 of pcommit have to go.
 
 There is also a conflict between a stable fix and this patch, where the
 stable fix removed the vmx_create_pml_buffer function and its call.
 
 - virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree.
 This tree adds kvm_io_bus_get_dev at the same place.
 
 - virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the
 file was completely removed for 4.8.
 
 - include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault;
 this is a change that should have gone in through the irqchip tree and
 pulled by kvm-arm.  I think I would have rejected this kvm-arm pull
 request.  The KVM version is the right one, except that it lacks
 GITS_BASER_PAGES_SHIFT.
 
 - arch/powerpc: what a mess.  For the idle_book3s.S conflict, the KVM
 tree is the right one; everything else is trivial.  In this case I am
 not quite sure what went wrong.  The commit that is causing the mess
 (fd7bacbca4, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit
 path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/
 and arch/powerpc/kvm/.  It's large, but at 396 insertions/5 deletions
 I guessed that it wasn't really possible to split it and that the 5
 deletions wouldn't conflict.  That wasn't the case.
 
 - arch/s390: also messy.  First is hypfs_diag.c where the KVM tree
 moved some code and the s390 tree patched it.  You have to reapply the
 relevant part of commits 6c22c98637, plus all of e030c1125e, to
 arch/s390/kernel/diag.c.  Or pick the linux-next conflict
 resolution from http://marc.info/?l=kvm&m=146717549531603&w=2.
 Second, there is a conflict in gmap.c between a stable fix and 4.8.
 The KVM version here is the correct one.
 
 I have pushed my resolution at refs/heads/merge-20160802 (commit
 3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia
 SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7
 U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b
 x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC
 qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI
 69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y=
 =42ti
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:

 - ARM: GICv3 ITS emulation and various fixes.  Removal of the
   old VGIC implementation.

 - s390: support for trapping software breakpoints, nested
   virtualization (vSIE), the STHYI opcode, initial extensions
   for CPU model support.

 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots
   of cleanups, preliminary to this and the upcoming support for
   hardware virtualization extensions.

 - x86: support for execute-only mappings in nested EPT; reduced
   vmexit latency for TSC deadline timer (by about 30%) on Intel
   hosts; support for more than 255 vCPUs.

 - PPC: bugfixes.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
  KVM: PPC: Introduce KVM_CAP_PPC_HTM
  MIPS: Select HAVE_KVM for MIPS64_R{2,6}
  MIPS: KVM: Reset CP0_PageMask during host TLB flush
  MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
  MIPS: KVM: Sign extend MFC0/RDHWR results
  MIPS: KVM: Fix 64-bit big endian dynamic translation
  MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
  MIPS: KVM: Use 64-bit CP0_EBase when appropriate
  MIPS: KVM: Set CP0_Status.KX on MIPS64
  MIPS: KVM: Make entry code MIPS64 friendly
  MIPS: KVM: Use kmap instead of CKSEG0ADDR()
  MIPS: KVM: Use virt_to_phys() to get commpage PFN
  MIPS: Fix definition of KSEGX() for 64-bit
  KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
  kvm: x86: nVMX: maintain internal copy of current VMCS
  KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
  KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
  KVM: arm64: vgic-its: Simplify MAPI error handling
  KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
  KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
  ...
2016-08-02 16:11:27 -04:00
Linus Torvalds
731c7d3a20 Merge tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux
Merge drm updates from Dave Airlie:
 "This is the main drm pull request for 4.8.

  I'm down with a cold at the moment so hopefully this isn't in too bad
  a state, I finished pulling stuff last week mostly (nouveau fixes just
  went in today), so only this message should be influenced by illness.
  Apologies to anyone who's major feature I missed :-)

  Core:
        Lockless GEM BO freeing
        Non-blocking atomic work
        Documentation changes (rst/sphinx)
        Prep for new fencing changes
        Simple display helpers
        Master/auth changes
        Register/unregister rework
        Loads of trivial patches/fixes.

  New stuff:
        ARM Mali display driver (not the 3D chip)
        sii902x RGB->HDMI bridge

  Panel:
        Support for new panels
        Improved backlight support

  Bridge:
        Convert ADV7511 to bridge driver
        ADV7533 support
        TC358767 (DSI/DPI to eDP) encoder chip support

  i915:
        BXT support enabled by default
        GVT-g infrastructure
        GuC command submission and fixes
        BXT workarounds
        SKL/BKL workarounds
        Demidlayering device registration
        Thundering herd fixes
        Missing pci ids
        Atomic updates

  amdgpu/radeon:
        ATPX improvements for better dGPU power control on PX systems
        New power features for CZ/BR/ST
        Pipelined BO moves and evictions in TTM
        GPU scheduler improvements
        GPU reset improvements
        Overclocking on dGPUs with amdgpu
        Polaris powermanagement enabled

  nouveau:
        GK20A/GM20B volt and clock improvements.
        Initial support for GP100/GP104 GPUs, GP104 will not yet support
        acceleration due to NVIDIA having not released firmware for them as of yet.

  exynos:
        Exynos5433 SoC with IOMMU support.

  vc4:
        Shader validation for branching

  imx-drm:
        Atomic mode setting conversion
        Reworked DMFC FIFO allocation
        External bridge support

  analogix-dp:
        RK3399 eDP support
        Lots of fixes.

  rockchip:
        Lots of small fixes.

  msm:
        DT bindings cleanups
        Shrinker and madvise support
        ASoC HDMI codec support

  tegra:
        Host1x driver cleanups
        SOR reworking for DP support
        Runtime PM support

  omapdrm:
        PLL enhancements
        Header refactoring
        Gamma table support

  arcgpu:
        Simulator support

  virtio-gpu:
        Atomic modesetting fixes.

  rcar-du:
        Misc fixes.

  mediatek:
        MT8173 HDMI support

  sti:
        ASOC HDMI codec support
        Minor fixes

  fsl-dcu:
        Suspend/resume support
        Bridge support

  amdkfd:
        Minor fixes.

  etnaviv:
        Enable GPU clock gating

  hisilicon:
        Vblank and other fixes"

* tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux: (1575 commits)
  drm/nouveau/gr/nv3x: fix instobj write offsets in gr setup
  drm/nouveau/acpi: fix lockup with PCIe runtime PM
  drm/nouveau/acpi: check for function 0x1B before using it
  drm/nouveau/acpi: return supported DSM functions
  drm/nouveau/acpi: ensure matching ACPI handle and supported functions
  drm/nouveau/fbcon: fix font width not divisible by 8
  drm/amd/powerplay: remove enable_clock_power_gatings_tasks from initialize and resume events
  drm/amd/powerplay: move clockgating to after ungating power in pp for uvd/vce
  drm/amdgpu: add query device id and revision id into system info entry at CGS
  drm/amdgpu: add new definition in bif header
  drm/amd/powerplay: rename smum header guards
  drm/amdgpu: enable UVD context buffer for older HW
  drm/amdgpu: fix default UVD context size
  drm/amdgpu: fix incorrect type of info_id
  drm/amdgpu: make amdgpu_cgs_call_acpi_method as static
  drm/amdgpu: comment out unused defaults_staturn_pro static const structure to fix the build
  drm/amdgpu: enable UVD VM only on polaris
  drm/amdgpu: increase timeout of IB test
  drm/amdgpu: add destroy session when generate VCE destroy msg.
  drm/amd: fix deadlock of job_list_lock V2
  ...
2016-08-01 21:44:08 -04:00
Linus Torvalds
aeb35d6b74 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 header cleanups from Ingo Molnar:
 "This tree is a cleanup of the x86 tree reducing spurious uses of
  module.h - which should improve build performance a bit"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
  x86/apic: Remove duplicated include from probe_64.c
  x86/ce4100: Remove duplicated include from ce4100.c
  x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
  x86/platform: Delete extraneous MODULE_* tags fromm ts5500
  x86: Audit and remove any remaining unnecessary uses of module.h
  x86/kvm: Audit and remove any unnecessary uses of module.h
  x86/xen: Audit and remove any unnecessary uses of module.h
  x86/platform: Audit and remove any unnecessary uses of module.h
  x86/lib: Audit and remove any unnecessary uses of module.h
  x86/kernel: Audit and remove any unnecessary uses of module.h
  x86/mm: Audit and remove any unnecessary uses of module.h
  x86: Don't use module.h just for AUTHOR / LICENSE tags
2016-08-01 14:23:42 -04:00
Bjorn Helgaas
9454c23852 Merge branch 'pci/msi-affinity' into next
Conflicts:
	drivers/nvme/host/pci.c
2016-08-01 12:34:01 -05:00
Bjorn Helgaas
a04bee8285 Merge branches 'pci/host-aardvark', 'pci/host-altera', 'pci/host-dra7xx', 'pci/host-hv', 'pci/host-vmd' and 'pci/host-xilinx' into next
* pci/host-aardvark:
  arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700
  PCI: aardvark: Add Aardvark PCI host controller driver
  dt-bindings: add DT binding for the Aardvark PCIe controller

* pci/host-altera:
  PCI: altera: Poll for link up status after retraining the link
  PCI: altera: Check link status before retrain link
  PCI: altera: Reorder read/write functions

* pci/host-dra7xx:
  PCI: dra7xx: Fix return value in case of error

* pci/host-hv:
  PCI: hv: Fix interrupt cleanup path
  PCI: hv: Handle all pending messages in hv_pci_onchannelcallback()
  PCI: hv: Don't leak buffer in hv_pci_onchannelcallback()

* pci/host-vmd:
  x86/PCI: VMD: Separate MSI and MSI-X vector sharing
  x86/PCI: VMD: Use x86_vector_domain as parent domain
  x86/PCI: VMD: Use lock save/restore in interrupt enable path
  x86/PCI: VMD: Initialize list item in IRQ disable
  x86/PCI: VMD: Select device dma ops to override

* pci/host-xilinx:
  PCI: xilinx: Fix return value in case of error

Manually apply changes from pci/demodularize-hosts and
pci/host-request-windows to drivers/pci/host/pci-aardvark.c
2016-08-01 12:32:13 -05:00
Jim Mattson
b80c76ec98 KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
Kexec needs to know the addresses of all VMCSs that are active on
each CPU, so that it can flush them from the VMCS caches. It is
safe to record superfluous addresses that are not associated with
an active VMCS, but it is not safe to omit an address associated
with an active VMCS.

After a call to vmcs_load, the VMCS that was loaded is active on
the CPU. The VMCS should be added to the CPU's list of active
VMCSs before it is loaded.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-01 15:14:24 +02:00
David Matlack
4f2777bc97 kvm: x86: nVMX: maintain internal copy of current VMCS
KVM maintains L1's current VMCS in guest memory, at the guest physical
page identified by the argument to VMPTRLD. This makes hairy
time-of-check to time-of-use bugs possible,as VCPUs can be writing
the the VMCS page in memory while KVM is emulating VMLAUNCH and
VMRESUME.

The spec documents that writing to the VMCS page while it is loaded is
"undefined". Therefore it is reasonable to load the entire VMCS into
an internal cache during VMPTRLD and ignore writes to the VMCS page
-- the guest should be using VMREAD and VMWRITE to access the current
VMCS.

To adhere to the spec, KVM should flush the current VMCS during VMPTRLD,
and the target VMCS during VMCLEAR (as given by the operand to VMCLEAR).
Since this implementation of VMCS caching only maintains the the current
VMCS, VMCLEAR will only do a flush if the operand to VMCLEAR is the
current VMCS pointer.

KVM will also flush during VMXOFF, which is not mandated by the spec,
but also not in conflict with the spec.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 14:49:05 +02:00
Juergen Gross
005bd0077a perf/x86: Modify error message in virtualized environment
It is known that PMU isn't working in some virtualized environments.

Modify the message issued in that case to mention why hardware PMU
isn't usable instead of reporting it to be broken.

As a side effect this will correct a little bug in the error message:
The error message was meant to be either of level err or info
depending on the environment (native or virtualized). As the level is
taken from the format string and not the printed string, specifying
it via %s and a conditional argument didn't work the way intended.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/1470051427-16795-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-01 14:30:34 +02:00
Borislav Petkov
b3830e8d47 x86/entry: Remove duplicated comment
Ok, ok, we see it is called from C :-)

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801100502.29796-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-01 12:33:08 +02:00
David Howells
f7d665627e x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace
x86_64 needs to use compat_sys_keyctl for 32-bit userspace rather than
calling sys_keyctl(). The latter will work in a lot of cases, thereby
hiding the issue.

Reported-by: Stephan Mueller <smueller@chronox.de>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/146961615805.14395.5581949237156769439.stgit@warthog.procyon.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-01 11:31:24 +02:00
Ingo Molnar
7946c2beac Merge branch 'x86/asm' into x86/urgent, to pick up fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-01 09:30:19 +02:00
Linus Torvalds
d761f3ed6e Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Thomas Gleixner:

 - more work to make the microcode loader robust

 - a fix for the micro code load precedence

 - fixes for initrd loading with randomized memory

 - less printk noise on SMP machines

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bit
  x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
  x86/microcode: Remove unused symbol exports
  x86/microcode/intel: Do not issue microcode updates messages on each CPU
  Documentation/microcode: Document some aspects for more clarity
  x86/microcode/AMD: Make amd_ucode_patch[] static
  x86/microcode/intel: Unexport save_mc_for_early()
  x86/microcode/intel: Rename load_microcode_early() to find_microcode_patch()
  x86/microcode: Propagate save_microcode_in_initrd() retval
  x86/microcode: Get rid of find_cpio_data()'s dummy offset arg
  lib/cpio: Make find_cpio_data()'s offset arg optional
  x86/microcode: Fix suspend to RAM with builtin microcode
  x86/microcode: Fix loading precedence
2016-07-30 13:18:33 -07:00
Linus Torvalds
b325e04ea2 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Thomas Gleixner:

 - a workaround for the MONITOR instruction erratum of Goldmont CPUs

 - small fixes and cleanups here and there

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add workaround for MONITOR instruction erratum on Goldmont based CPUs
  x86/cpu: Rename "WESTMERE2" family to "NEHALEM_G"
  x86/amd_nb: Clean up init path
  x86/cpufeature: Add helper macro for mask check macros
  x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
  x86/cpufeature: Update cpufeaure macros
2016-07-30 12:56:26 -07:00
Linus Torvalds
7a1e8b80fb Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - TPM core and driver updates/fixes
   - IPv6 security labeling (CALIPSO)
   - Lots of Apparmor fixes
   - Seccomp: remove 2-phase API, close hole where ptrace can change
     syscall #"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
  apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
  tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
  tpm: Factor out common startup code
  tpm: use devm_add_action_or_reset
  tpm2_i2c_nuvoton: add irq validity check
  tpm: read burstcount from TPM_STS in one 32-bit transaction
  tpm: fix byte-order for the value read by tpm2_get_tpm_pt
  tpm_tis_core: convert max timeouts from msec to jiffies
  apparmor: fix arg_size computation for when setprocattr is null terminated
  apparmor: fix oops, validate buffer size in apparmor_setprocattr()
  apparmor: do not expose kernel stack
  apparmor: fix module parameters can be changed after policy is locked
  apparmor: fix oops in profile_unpack() when policy_db is not present
  apparmor: don't check for vmalloc_addr if kvzalloc() failed
  apparmor: add missing id bounds check on dfa verification
  apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
  apparmor: use list_next_entry instead of list_entry_next
  apparmor: fix refcount race when finding a child profile
  apparmor: fix ref count leak when profile sha1 hash is read
  apparmor: check that xindex is in trans_table bounds
  ...
2016-07-29 17:38:46 -07:00
Linus Torvalds
601f887d61 Power management fix for v4.8-rc1
Fix a nasty (and really hard to debug) memory corruption during
 resume from hibernation on x86-64 (that leads to a kernel panic
 most of the time) due to the use of a stale stack pointer value
 in FRAME_BEGIN (Josh Poimboeuf).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXm76YAAoJEILEb/54YlRxBu0QAIwOnS0YJO/AuXNNEUxzK3xu
 /+nRda38MJVUeoU/Z8wBkf5l8XEzNgYZPabj87r4Dkyg32dWKXHwW8R9ApyO1RfA
 7CReTgO2rKfOrDWCq2uYdea0hKM36vavjUHGJ+fSScaqHWaGSqvqeHusroUDL+3y
 grRI8GDNyfG1eV9IxrJwcE0Oegp0QKX5saSKXXVoeaM63YnNUwv9usdEOpdE1lmk
 r6WwpYNZSh/vjaNlLEcrmEAbU3Nv/wBNqGBkoZoATgTT0YsONfqUJcU3fhVXeNMR
 u3Tnk2HJZUmJ3HQgFBcpGU5r1/c/qX8PzPK7e3D60QDgztkADCroW9WOYfwICs+A
 qmsCUs0bVyqxUOBWLIbyJ/n9/VzkeE/cD3lOjkGh/ChRZWpUeDobJWeoseMxDzG5
 21kOwEQvel9Itk8C57zoWCvuSNddg9M2f/GL7yoYtyqKrMriRwwqftIOhJocySP0
 2eONtBn2be9IUX/cdSuVVnqvWP9pGCXqr4IJZoormX3lf1Zi+/G9buPo0At35k5F
 kHxRhnAGsnYxolAhVboxQdkGItrP0LlDvXLnLzPhWjz8qOLOkZpnIfpjsj+bwF7K
 2S+Yr3oBldxIYu1A+jne/fOiTUiQ5iAicWnLwNQpmETWu/+Rz72bweWRUYiE71mW
 H0M0HaD9UzhockpqpeEM
 =O25M
 -----END PGP SIGNATURE-----

Merge tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix a nasty (and really hard to debug) memory corruption during resume
  from hibernation on x86-64 (that leads to a kernel panic most of the
  time) due to the use of a stale stack pointer value in FRAME_BEGIN
  (Josh Poimboeuf)"

* tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  x86/power/64: Fix hibernation return address corruption
2016-07-29 14:34:55 -07:00
Linus Torvalds
a6408f6cb6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the next part of the hotplug rework.

   - Convert all notifiers with a priority assigned

   - Convert all CPU_STARTING/DYING notifiers

     The final removal of the STARTING/DYING infrastructure will happen
     when the merge window closes.

  Another 700 hundred line of unpenetrable maze gone :)"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  timers/core: Correct callback order during CPU hot plug
  leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
  powerpc/numa: Convert to hotplug state machine
  arm/perf: Fix hotplug state machine conversion
  irqchip/armada: Avoid unused function warnings
  ARC/time: Convert to hotplug state machine
  clocksource/atlas7: Convert to hotplug state machine
  clocksource/armada-370-xp: Convert to hotplug state machine
  clocksource/exynos_mct: Convert to hotplug state machine
  clocksource/arm_global_timer: Convert to hotplug state machine
  rcu: Convert rcutree to hotplug state machine
  KVM/arm/arm64/vgic-new: Convert to hotplug state machine
  smp/cfd: Convert core to hotplug state machine
  x86/x2apic: Convert to CPU hotplug state machine
  profile: Convert to hotplug state machine
  timers/core: Convert to hotplug state machine
  hrtimer: Convert to hotplug state machine
  x86/tboot: Convert to hotplug state machine
  arm64/armv8 deprecated: Convert to hotplug state machine
  hwtracing/coresight-etm4x: Convert to hotplug state machine
  ...
2016-07-29 13:55:30 -07:00
Rafael J. Wysocki
e148d0f85c Merge branch 'pm-sleep'
* pm-sleep:
  x86/power/64: Fix hibernation return address corruption
2016-07-29 22:12:54 +02:00
Josh Poimboeuf
4ce827b4cc x86/power/64: Fix hibernation return address corruption
In kernel bug 150021, a kernel panic was reported when restoring a
hibernate image.  Only a picture of the oops was reported, so I can't
paste the whole thing here.  But here are the most interesting parts:

  kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
  BUG: unable to handle kernel paging request at ffff8804615cfd78
  ...
  RIP: ffff8804615cfd78
  RSP: ffff8804615f0000
  RBP: ffff8804615cfdc0
  ...
  Call Trace:
   do_signal+0x23
   exit_to_usermode_loop+0x64
   ...

The RIP is on the same page as RBP, so it apparently started executing
on the stack.

The bug was bisected to commit ef0f3ed5a4 (x86/asm/power: Create
stack frames in hibernate_asm_64.S), which in retrospect seems quite
dangerous, since that code saves and restores the stack pointer from a
global variable ('saved_context').

There are a lot of moving parts in the hibernate save and restore paths,
so I don't know exactly what caused the panic.  Presumably, a FRAME_END
was executed without the corresponding FRAME_BEGIN, or vice versa.  That
would corrupt the return address on the stack and would be consistent
with the details of the above panic.

[ rjw: One major problem is that by the time the FRAME_BEGIN in
  restore_registers() is executed, the stack pointer value may not
  be valid any more.  Namely, the stack area pointed to by it
  previously may have been overwritten by some image memory contents
  and that page frame may now be used for whatever different purpose
  it had been allocated for before hibernation.  In that case, the
  FRAME_BEGIN will corrupt that memory. ]

Instead of doing the frame pointer save/restore around the bounds of the
affected functions, just do it around the call to swsusp_save().

That has the same effect of ensuring that if swsusp_save() sleeps, the
frame pointers will be correct.  It's also a much more obviously safe
way to do it than the original patch.  And objtool still doesn't report
any warnings.

Fixes: ef0f3ed5a4 (x86/asm/power: Create stack frames in hibernate_asm_64.S)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Reported-by: Andre Reinke <andre.reinke@mailbox.org>
Tested-by: Andre Reinke <andre.reinke@mailbox.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-29 13:38:59 +02:00
Linus Torvalds
f0c98ebc57 libnvdimm for 4.8
1/ Replace pcommit with ADR / directed-flushing:
    The pcommit instruction, which has not shipped on any product, is
    deprecated. Instead, the requirement is that platforms implement either
    ADR, or provide one or more flush addresses per nvdimm. ADR
    (Asynchronous DRAM Refresh) flushes data in posted write buffers to the
    memory controller on a power-fail event. Flush addresses are defined in
    ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
    "Flush Hint Address Structure". A flush hint is an mmio address that
    when written and fenced assures that all previous posted writes
    targeting a given dimm have been flushed to media.
 
 2/ On-demand ARS (address range scrub):
    Linux uses the results of the ACPI ARS commands to track bad blocks
    in pmem devices.  When latent errors are detected we re-scrub the media
    to refresh the bad block list, userspace can also request a re-scrub at
    any time.
 
 3/ Support for the Microsoft DSM (device specific method) command format.
 
 4/ Support for EDK2/OVMF virtual disk device memory ranges.
 
 5/ Various fixes and cleanups across the subsystem.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXmXBsAAoJEB7SkWpmfYgCEwwP/1IOt9ocP+iHLMDH9KE7VaTZ
 NmUDR+Zy6g5cRQM7SgcuU5BXUcx+OsSrSrUTVF1cW994o9Gbz1mFotkv0ZAsPcYY
 ZVRQxo2oqHrssyOcg+PsgKWiXn68rJOCgmpEyzaJywl5qTMst7pzsT1s1f7rSh6h
 trCf4VaJJwxZR8fARGtlHUnnhPe2Orp99EZRKEWprAsIv2kPuWpPHSjRjuEgN1JG
 KW8AYwWqFTtiLRUk86I4KBB0wcDrfctsjgN9Ogd6+aHyQBRnVSr2U+vDCFkC8KLu
 qiDCpYp+yyxBjclnljz7tRRT3GtzfCUWd4v2KVWqgg2IaobUc0Lbukp/rmikUXQP
 WLikT2OCQ994eFK5OX3Q3cIU/4j459TQnof8q14yVSpjAKrNUXVSR5puN7Hxa+V7
 41wKrAsnsyY1oq+Yd/rMR8VfH7PHx3bFkrmRCGZCufLX1UQm4aYj+sWagDKiV3yA
 DiudghbOnhfurfGsnXUVw7y7GKs+gNWNBmB6ndAD6ZEHmKoGUhAEbJDLCc3DnANl
 b/2mv1MIdIcC1DlCmnbbcn6fv6bICe/r8poK3VrCK3UgOq/EOvKIWl7giP+k1JuC
 6DdVYhlNYIVFXUNSLFAwz8OkLu8byx7WDm36iEqrKHtPw+8qa/2bWVgOU6OBgpjV
 cN3edFVIdxvZeMgM5Ubq
 =xCBG
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:

 - Replace pcommit with ADR / directed-flushing.

   The pcommit instruction, which has not shipped on any product, is
   deprecated.  Instead, the requirement is that platforms implement
   either ADR, or provide one or more flush addresses per nvdimm.

   ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
   to the memory controller on a power-fail event.

   Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
   Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
   A flush hint is an mmio address that when written and fenced assures
   that all previous posted writes targeting a given dimm have been
   flushed to media.

 - On-demand ARS (address range scrub).

   Linux uses the results of the ACPI ARS commands to track bad blocks
   in pmem devices.  When latent errors are detected we re-scrub the
   media to refresh the bad block list, userspace can also request a
   re-scrub at any time.

 - Support for the Microsoft DSM (device specific method) command
   format.

 - Support for EDK2/OVMF virtual disk device memory ranges.

 - Various fixes and cleanups across the subsystem.

* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
  libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
  nfit: do an ARS scrub on hitting a latent media error
  nfit: move to nfit/ sub-directory
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  libnvdimm: register nvdimm_bus devices with an nd_bus driver
  pmem: clarify a debug print in pmem_clear_poison
  x86/insn: remove pcommit
  Revert "KVM: x86: add pcommit support"
  nfit, tools/testing/nvdimm/: unify shutdown paths
  libnvdimm: move ->module to struct nvdimm_bus_descriptor
  nfit: cleanup acpi_nfit_init calling convention
  nfit: fix _FIT evaluation memory leak + use after free
  tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
  tools/testing/nvdimm: add virtual ramdisk range
  acpi, nfit: treat virtual ramdisk SPA as pmem region
  pmem: kill __pmem address space
  pmem: kill wmb_pmem()
  libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  fs/dax: remove wmb_pmem()
  libnvdimm, pmem: flush posted-write queues on shutdown
  ...
2016-07-28 17:38:16 -07:00
Linus Torvalds
468fc7ed55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

 2) Make DSA binding more sane, from Andrew Lunn.

 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface.  From Brenden Blanco and
    others.

 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

 8) Simplify netlink conntrack entry layout, from Florian Westphal.

 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

11) Support qdisc packet injection in pktgen, from John Fastabend.

12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

13) Add NV congestion control support to TCP, from Lawrence Brakmo.

14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

16) Support MPLS over IPV4, from Simon Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  xgene: Fix build warning with ACPI disabled.
  be2net: perform temperature query in adapter regardless of its interface state
  l2tp: Correctly return -EBADF from pppol2tp_getname.
  net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
  net: ipmr/ip6mr: update lastuse on entry change
  macsec: ensure rx_sa is set when validation is disabled
  tipc: dump monitor attributes
  tipc: add a function to get the bearer name
  tipc: get monitor threshold for the cluster
  tipc: make cluster size threshold for monitoring configurable
  tipc: introduce constants for tipc address validation
  net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
  MAINTAINERS: xgene: Add driver and documentation path
  Documentation: dtb: xgene: Add MDIO node
  dtb: xgene: Add MDIO node
  drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
  drivers: net: xgene: Use exported functions
  drivers: net: xgene: Enable MDIO driver
  drivers: net: xgene: Add backward compatibility
  drivers: net: phy: xgene: Add MDIO driver
  ...
2016-07-27 12:03:20 -07:00
Linus Torvalds
08fd8c1768 xen: features and fixes for 4.8-rc0
- ACPI support for guests on ARM platforms.
 - Generic steal time support for arm and x86.
 - Support cases where kernel cpu is not Xen VCPU number (e.g., if
   in-guest kexec is used).
 - Use the system workqueue instead of a custom workqueue in various
   places.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXmLlrAAoJEFxbo/MsZsTRvRQH/1wOMF8BmlbZfR7H3qwDfjst
 ApNifCiZE08xDtWBlwUaBFAQxyflQS9BBiNZDVK0sysIdXeOdpWV7V0ZjRoLL+xr
 czsaaGXDcmXxJxApoMDVuT7FeP6rEk6LVAYRoHpVjJjMZGW3BbX1vZaMW4DXl2WM
 9YNaF2Lj+rpc1f8iG31nUxwkpmcXFog6ct4tu7HiyCFT3hDkHt/a4ghuBdQItCkd
 vqBa1pTpcGtQBhSmWzlylN/PV2+NKcRd+kGiwd09/O/rNzogTMCTTWeHKAtMpPYb
 Cu6oSqJtlK5o0vtr0qyLSWEGIoyjE2gE92s0wN3iCzFY1PldqdsxUO622nIj+6o=
 =G6q3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Features and fixes for 4.8-rc0:

   - ACPI support for guests on ARM platforms.
   - Generic steal time support for arm and x86.
   - Support cases where kernel cpu is not Xen VCPU number (e.g., if
     in-guest kexec is used).
   - Use the system workqueue instead of a custom workqueue in various
     places"

* tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits)
  xen: add static initialization of steal_clock op to xen_time_ops
  xen/pvhvm: run xen_vcpu_setup() for the boot CPU
  xen/evtchn: use xen_vcpu_id mapping
  xen/events: fifo: use xen_vcpu_id mapping
  xen/events: use xen_vcpu_id mapping in events_base
  x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
  x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
  xen: introduce xen_vcpu_id mapping
  x86/acpi: store ACPI ids from MADT for future usage
  x86/xen: update cpuid.h from Xen-4.7
  xen/evtchn: add IOCTL_EVTCHN_RESTRICT
  xen-blkback: really don't leak mode property
  xen-blkback: constify instance of "struct attribute_group"
  xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
  xen-blkback: prefer xenbus_scanf() over xenbus_gather()
  xen: support runqueue steal time on xen
  arm/xen: add support for vm_assist hypercall
  xen: update xen headers
  xen-pciback: drop superfluous variables
  xen-pciback: short-circuit read path used for merging write values
  ...
2016-07-27 11:35:37 -07:00
Borislav Petkov
4a1a8e1b8f x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bit
... in order to avoid #ifdeffery in code computing the ASLR randomization
offset. Remove that #ifdeffery in the microcode loader.

Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160727120939.GA18911@nazgul.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-27 14:59:59 +02:00
Ingo Molnar
df15929f8f Merge branch 'linus' into x86/microcode, to pick up merge window changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-27 12:35:35 +02:00
Andy Lutomirski
609c19a385 x86/ptrace: Stop setting TS_COMPAT in ptrace code
Setting TS_COMPAT in ptrace is wrong: if we happen to do it during
syscall entry, then we'll confuse seccomp and audit.  (The former
isn't a security problem: seccomp is currently entirely insecure if a
malicious ptracer is attached.)  As a minimal fix, this patch adds a
new flag TS_I386_REGS_POKED that handles the ptrace special case.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5383ebed38b39fa37462139e337aff7f2314d1ca.1469599803.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-27 11:09:43 +02:00
Linus Torvalds
0e06f5c0de Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2

 - most(?) of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
  thp: fix comments of __pmd_trans_huge_lock()
  cgroup: remove unnecessary 0 check from css_from_id()
  cgroup: fix idr leak for the first cgroup root
  mm: memcontrol: fix documentation for compound parameter
  mm: memcontrol: remove BUG_ON in uncharge_list
  mm: fix build warnings in <linux/compaction.h>
  mm, thp: convert from optimistic swapin collapsing to conservative
  mm, thp: fix comment inconsistency for swapin readahead functions
  thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
  shmem: split huge pages beyond i_size under memory pressure
  thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
  khugepaged: add support of collapse for tmpfs/shmem pages
  shmem: make shmem_inode_info::lock irq-safe
  khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
  thp: extract khugepaged from mm/huge_memory.c
  shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
  shmem: add huge pages support
  shmem: get_unmapped_area align huge page
  shmem: prepare huge= mount option and sysfs knob
  mm, rmap: account shmem thp pages
  ...
2016-07-26 19:55:54 -07:00
Linus Torvalds
e663107fa1 ACPI material for v4.8-rc1
- Support for ACPI SSDT overlays allowing Secondary System
    Description Tables (SSDTs) to be loaded at any time from EFI
    variables or via configfs (Octavian Purdila, Mika Westerberg).
 
  - Support for the ACPI LPI (Low-Power Idle) feature introduced in
    ACPI 6.0 and allowing processor idle states to be represented in
    ACPI tables in a hierarchical way (with the help of Processor
    Container objects) and support for ACPI idle states management
    on ARM64, based on LPI (Sudeep Holla).
 
  - General improvements of ACPI support for NUMA and ARM64 support
    for ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).
 
  - General improvements of the ACPI table upgrade mechanism and
    ARM64 support for that feature (Aleksey Makarov, Jon Masters).
 
  - Support for the Boot Error Record Table (BERT) in APEI and
    improvements of kernel messages printed by the error injection
    code (Huang Ying, Borislav Petkov).
 
  - New driver for the Intel Broxton WhiskeyCove PMIC operation
    region and support for the REGS operation region on Broxton,
    PMIC code cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).
 
  - New driver for the power participant device which is part of the
    Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
    reorganization (Srinivas Pandruvada).
 
  - Support for the platform-initiated graceful shutdown feature
    introduced in ACPI 6.1 (Prashanth Prakash).
 
  - ACPI button driver update related to lid input events generated
    automatically on initialization and system resume that have been
    problematic for some time (Lv Zheng).
 
  - ACPI EC driver cleanups (Lv Zheng).
 
  - Documentation of the ACPICA release automation process and the
    in-kernel ACPI AML debugger (Lv Zheng).
 
  - New blacklist entry and two fixes for the ACPI backlight driver
    (Alex Hung, Arvind Yadav, Ralf Gerbig).
 
  - Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).
 
  - ACPI CPPC code changes to make it more robust against possible
    defects in ACPI tables and new symbol definitions for PCC (Hoan
    Tran).
 
  - System reboot code modification to execute the ACPI _PTS (Prepare
    To Sleep) method in addition to _TTS (Ocean He).
 
  - ACPICA-related change to carry out lock ordering checks in ACPICA
    if ACPICA debug is enabled in the kernel (Lv Zheng).
 
  - Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
    Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXl8A7AAoJEILEb/54YlRxF0kQAI6mH0yan60Osu4598+VNvgv
 wxOWl1TEbKd+LaJkofRZ+FPzZkQf5c/h/8Oo8Q3LEpFhjkARhhX7ThDjS5v2Nx6v
 I/icQ64ynPUPrw6hGNVrmec9ofZjiAs3j3Rt2bEiae+YN6guvfhWE+kBCHo2G/nN
 o4BSaxYjkphUTDSi4/5BfaocV2sl3apvwjtAj8zgGn4RD81bFFLnblynHkqJVcoN
 HAfm7QTVjT01Zkv565OSZgK8CFcD8Ky2KKKBQvcIW8zQmD6IXaoTHSYSwL0SH+oK
 bxUZUmWVfFWw4kDTAY9mw0QwtWz9ODTWh/WMhs3itWRRN5qHfogs99rCVYFtFufQ
 ODVy4wpt4wmpzZVhyUDTTigAhznPAtCam6EpL1YeNbtyrRN4evvZVFHBZJnmhosX
 zI9iLF4eqdnJZKvh+L1VFU+py8aAZpz1ZEOatNMI+xdhArbGm7v89cldzaRkJhuW
 LZr+JqYQGaOZS5qSnymwJL1KfF66+2QGpzdvzJN5FNIDACoqanATbZ/Iie2ENcM+
 WwCEWrGJFDmM30raBNNcvx0yHFtVkcNbOymla4paVg7i29nu88Ynw4Z6seIIP11C
 DryzLFhw+3jdTg2zK/te/wkhciJ0F+iZjo6VXywSMnwatf36bpdp4r4JLUVfEo2t
 8DOGKyFMLYY1zOPMK9Th
 =YwbM
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "The new feaures here are the support for ACPI overlays (allowing ACPI
  tables to be loaded at any time from EFI variables or via configfs)
  and the LPI (Low-Power Idle) support.  Also notable is the ACPI-based
  NUMA support for ARM64.

  Apart from that we have two new drivers, for the DPTF (Dynamic Power
  and Thermal Framework) power participant device and for the Intel
  Broxton WhiskeyCove PMIC, some more PMIC-related changes, support for
  the Boot Error Record Table (BERT) in APEI and support for
  platform-initiated graceful shutdown.

  Plus two new pieces of documentation and usual assorted fixes and
  cleanups in quite a few places.

  Specifics:

   - Support for ACPI SSDT overlays allowing Secondary System
     Description Tables (SSDTs) to be loaded at any time from EFI
     variables or via configfs (Octavian Purdila, Mika Westerberg).

   - Support for the ACPI LPI (Low-Power Idle) feature introduced in
     ACPI 6.0 and allowing processor idle states to be represented in
     ACPI tables in a hierarchical way (with the help of Processor
     Container objects) and support for ACPI idle states management on
     ARM64, based on LPI (Sudeep Holla).

   - General improvements of ACPI support for NUMA and ARM64 support for
     ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).

   - General improvements of the ACPI table upgrade mechanism and ARM64
     support for that feature (Aleksey Makarov, Jon Masters).

   - Support for the Boot Error Record Table (BERT) in APEI and
     improvements of kernel messages printed by the error injection code
     (Huang Ying, Borislav Petkov).

   - New driver for the Intel Broxton WhiskeyCove PMIC operation region
     and support for the REGS operation region on Broxton, PMIC code
     cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).

   - New driver for the power participant device which is part of the
     Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
     reorganization (Srinivas Pandruvada).

   - Support for the platform-initiated graceful shutdown feature
     introduced in ACPI 6.1 (Prashanth Prakash).

   - ACPI button driver update related to lid input events generated
     automatically on initialization and system resume that have been
     problematic for some time (Lv Zheng).

   - ACPI EC driver cleanups (Lv Zheng).

   - Documentation of the ACPICA release automation process and the
     in-kernel ACPI AML debugger (Lv Zheng).

   - New blacklist entry and two fixes for the ACPI backlight driver
     (Alex Hung, Arvind Yadav, Ralf Gerbig).

   - Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).

   - ACPI CPPC code changes to make it more robust against possible
     defects in ACPI tables and new symbol definitions for PCC (Hoan
     Tran).

   - System reboot code modification to execute the ACPI _PTS (Prepare
     To Sleep) method in addition to _TTS (Ocean He).

   - ACPICA-related change to carry out lock ordering checks in ACPICA
     if ACPICA debug is enabled in the kernel (Lv Zheng).

   - Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
     Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki)"

* tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits)
  ACPI: enable ACPI_PROCESSOR_IDLE on ARM64
  arm64: add support for ACPI Low Power Idle(LPI)
  drivers: firmware: psci: initialise idle states using ACPI LPI
  cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64}
  arm64: cpuidle: drop __init section marker to arm_cpuidle_init
  ACPI / processor_idle: Add support for Low Power Idle(LPI) states
  ACPI / processor_idle: introduce ACPI_PROCESSOR_CSTATE
  ACPI / DPTF: move int340x_thermal.c to the DPTF folder
  ACPI / DPTF: Add DPTF power participant driver
  ACPI / lpat: make it explicitly non-modular
  ACPI / dock: make dock explicitly non-modular
  ACPI / PCI: make pci_slot explicitly non-modular
  ACPI / PMIC: remove modular references from non-modular code
  ACPICA: Linux: Enable ACPI_MUTEX_DEBUG for Linux kernel
  ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
  ACPI / debugger: Add AML debugger documentation
  ACPI: Add documentation describing ACPICA release automation
  ACPI: add support for loading SSDTs via configfs
  ACPI: add support for configfs
  efi / ACPI: load SSTDs from EFI variables
  ...
2016-07-26 17:56:45 -07:00
Linus Torvalds
6453dbdda3 Power management material for v4.8-rc1
- Rework the cpufreq governor interface to make it more straightforward
    and modify the conservative governor to avoid using transition
    notifications (Rafael Wysocki).
 
  - Rework the handling of frequency tables by the cpufreq core to make
    it more efficient (Viresh Kumar).
 
  - Modify the schedutil governor to reduce the number of wakeups it
    causes to occur in cases when the CPU frequency doesn't need to be
    changed (Steve Muckle, Viresh Kumar).
 
  - Fix some minor issues and clean up code in the cpufreq core and
    governors (Rafael Wysocki, Viresh Kumar).
 
  - Add Intel Broxton support to the intel_pstate driver (Srinivas
    Pandruvada).
 
  - Fix problems related to the config TDP feature and to the validity
    of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
    Srinivas Pandruvada).
 
  - Make intel_pstate update the cpu_frequency tracepoint even if
    the frequency doesn't change to avoid confusing powertop (Rafael
    Wysocki).
 
  - Clean up the usage of __init/__initdata in intel_pstate, mark some
    of its internal variables as __read_mostly and drop an unused
    structure element from it (Jisheng Zhang, Carsten Emde).
 
  - Clean up the usage of some duplicate MSR symbols in intel_pstate
    and turbostat (Srinivas Pandruvada).
 
  - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
    Adiga, Viresh Kumar, Ben Dooks).
 
  - Fix a regression (introduced during the 4.5 cycle) in the
    pcc-cpufreq driver by reverting the problematic commit (Andreas
    Herrmann).
 
  - Add support for Intel Denverton to intel_idle, clean up Broxton
    support in it and make it explicitly non-modular (Jacob Pan,
    Jan Beulich, Paul Gortmaker).
 
  - Add support for Denverton and Ivy Bridge server to the Intel RAPL
    power capping driver and make it more careful about the handing
    of MSRs that may not be present (Jacob Pan, Xiaolong Wang).
 
  - Fix resume from hibernation on x86-64 by making the CPU offline
    during resume avoid using MONITOR/MWAIT in the "play dead" loop
    which may lead to an inadvertent "revival" of a "dead" CPU and
    a page fault leading to a kernel crash from it (Rafael Wysocki).
 
  - Make memory management during resume from hibernation more
    straightforward (Rafael Wysocki).
 
  - Add debug features that should help to detect problems related
    to hibernation and resume from it (Rafael Wysocki, Chen Yu).
 
  - Clean up hibernation core somewhat (Rafael Wysocki).
 
  - Prevent KASAN from instrumenting the hibernation core which leads
    to large numbers of false-positives from it (James Morse).
 
  - Prevent PM (hibernate and suspend) notifiers from being called
    during the cleanup phase if they have not been called during the
    corresponding preparation phase which is possible if one of the
    other notifiers returns an error at that time (Lianwei Wang).
 
  - Improve suspend-related debug printout in the tasks freezer and
    clean up suspend-related console handling (Roger Lu, Borislav
    Petkov).
 
  - Update the AnalyzeSuspend script in the kernel sources to
    version 4.2 (Todd Brandt).
 
  - Modify the generic power domains framework to make it handle
    system suspend/resume better (Ulf Hansson).
 
  - Make the runtime PM framework avoid resuming devices synchronously
    when user space changes the runtime PM settings for them and
    improve its error reporting (Rafael Wysocki, Linus Walleij).
 
  - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus)
    and in the core, make some devfreq code explicitly non-modular and
    change some of it into tristate (Bartlomiej Zolnierkiewicz,
    Peter Chen, Paul Gortmaker).
 
  - Add DT support to the generic PM clocks management code and make
    it export some more symbols (Jon Hunter, Paul Gortmaker).
 
  - Make the PCI PM core code slightly more robust against possible
    driver errors (Andy Shevchenko).
 
  - Make it possible to change DESTDIR and PREFIX in turbostat
    (Andy Shevchenko).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXl7/dAAoJEILEb/54YlRx+VgQAIQJOWvxKew3Yl02c/sdj9OT
 5VNnFrzGzdcAPofvvG9qGq8B0Es1vYehJpwwOB21ri8EvYv0riIiU1yrqslObojQ
 oaZOkSBpbIoKjGR4CpYA/A+feE+8EqIBdPGd+lx5a6oRdUi7tRVHBG9lyLO3FB/i
 jan1q8dMpZsmu+Y+rVVHGnCVuIlIEqr2ZnZfCwDAulO2Arp/QFAh4kH08ELATvrl
 bkPa25vq7/VMP/vCDzrfZKD5mUuKogIRu/J5wx4py1nE+FB35cKKyqBOgklLwAeY
 UI8vjDhr/myNUs54AZlktOkq47TCYvjvhX9kmOxBjuWqFbRusU012IRek1fYPRIV
 ZqbkqNX7UEVQwunAEg9AyFwyzEtOht93dQDT5RLEd4QzKuM76gmHpLeTGGMzE+nu
 FnmF9JGl4DVwqpZl9yU2+hR2Mt3bP8OF8qYmNiGUB3KO4emPslhSd+6y8liA5Bx2
 SJf0Gb//vaHCh3/uMnwAonYPqRkZvBLOMwuL1VUjNQfRMnQtDdgHMYB1aT/EglPA
 8ww6j4J8rVRLAxvYQ3UEmNA/vBNclKXblRR18+JddEZP9/oX0ATfwnCCUpr839uk
 xxyQhrm4/AI60+PHWCX4GG80YrKdOGTkF7LXCQZanVWjjuyF17rufegZ2YWLT07v
 JU1Cmumfdy2jJluT8xsR
 =uVGz
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael  Wysocki:
 "Again, the majority of changes go into the cpufreq subsystem, but
  there are no big features this time.  The cpufreq changes that stand
  out somewhat are the governor interface rework and improvements
  related to the handling of frequency tables.  Apart from those, there
  are fixes and new device/CPU IDs in drivers, cleanups and an
  improvement of the new schedutil governor.

  Next, there are some changes in the hibernation core, including a fix
  for a nasty problem related to the MONITOR/MWAIT usage by CPU offline
  during resume from hibernation, a few core improvements related to
  memory management during resume, a couple of additional debug features
  and cleanups.

  Finally, we have some fixes and cleanups in the devfreq subsystem,
  generic power domains framework improvements related to system
  suspend/resume, support for some new chips in intel_idle and in the
  power capping RAPL driver, a new version of the AnalyzeSuspend utility
  and some assorted fixes and cleanups.

  Specifics:

   - Rework the cpufreq governor interface to make it more
     straightforward and modify the conservative governor to avoid using
     transition notifications (Rafael Wysocki).

   - Rework the handling of frequency tables by the cpufreq core to make
     it more efficient (Viresh Kumar).

   - Modify the schedutil governor to reduce the number of wakeups it
     causes to occur in cases when the CPU frequency doesn't need to be
     changed (Steve Muckle, Viresh Kumar).

   - Fix some minor issues and clean up code in the cpufreq core and
     governors (Rafael Wysocki, Viresh Kumar).

   - Add Intel Broxton support to the intel_pstate driver (Srinivas
     Pandruvada).

   - Fix problems related to the config TDP feature and to the validity
     of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
     Srinivas Pandruvada).

   - Make intel_pstate update the cpu_frequency tracepoint even if the
     frequency doesn't change to avoid confusing powertop (Rafael
     Wysocki).

   - Clean up the usage of __init/__initdata in intel_pstate, mark some
     of its internal variables as __read_mostly and drop an unused
     structure element from it (Jisheng Zhang, Carsten Emde).

   - Clean up the usage of some duplicate MSR symbols in intel_pstate
     and turbostat (Srinivas Pandruvada).

   - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
     Adiga, Viresh Kumar, Ben Dooks).

   - Fix a regression (introduced during the 4.5 cycle) in the
     pcc-cpufreq driver by reverting the problematic commit (Andreas
     Herrmann).

   - Add support for Intel Denverton to intel_idle, clean up Broxton
     support in it and make it explicitly non-modular (Jacob Pan, Jan
     Beulich, Paul Gortmaker).

   - Add support for Denverton and Ivy Bridge server to the Intel RAPL
     power capping driver and make it more careful about the handing of
     MSRs that may not be present (Jacob Pan, Xiaolong Wang).

   - Fix resume from hibernation on x86-64 by making the CPU offline
     during resume avoid using MONITOR/MWAIT in the "play dead" loop
     which may lead to an inadvertent "revival" of a "dead" CPU and a
     page fault leading to a kernel crash from it (Rafael Wysocki).

   - Make memory management during resume from hibernation more
     straightforward (Rafael Wysocki).

   - Add debug features that should help to detect problems related to
     hibernation and resume from it (Rafael Wysocki, Chen Yu).

   - Clean up hibernation core somewhat (Rafael Wysocki).

   - Prevent KASAN from instrumenting the hibernation core which leads
     to large numbers of false-positives from it (James Morse).

   - Prevent PM (hibernate and suspend) notifiers from being called
     during the cleanup phase if they have not been called during the
     corresponding preparation phase which is possible if one of the
     other notifiers returns an error at that time (Lianwei Wang).

   - Improve suspend-related debug printout in the tasks freezer and
     clean up suspend-related console handling (Roger Lu, Borislav
     Petkov).

   - Update the AnalyzeSuspend script in the kernel sources to version
     4.2 (Todd Brandt).

   - Modify the generic power domains framework to make it handle system
     suspend/resume better (Ulf Hansson).

   - Make the runtime PM framework avoid resuming devices synchronously
     when user space changes the runtime PM settings for them and
     improve its error reporting (Rafael Wysocki, Linus Walleij).

   - Fix error paths in devfreq drivers (exynos, exynos-ppmu,
     exynos-bus) and in the core, make some devfreq code explicitly
     non-modular and change some of it into tristate (Bartlomiej
     Zolnierkiewicz, Peter Chen, Paul Gortmaker).

   - Add DT support to the generic PM clocks management code and make it
     export some more symbols (Jon Hunter, Paul Gortmaker).

   - Make the PCI PM core code slightly more robust against possible
     driver errors (Andy Shevchenko).

   - Make it possible to change DESTDIR and PREFIX in turbostat (Andy
     Shevchenko)"

* tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
  Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
  PM / hibernate: Introduce test_resume mode for hibernation
  cpufreq: export cpufreq_driver_resolve_freq()
  cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
  PCI / PM: check all fields in pci_set_platform_pm()
  cpufreq: acpi-cpufreq: use cached frequency mapping when possible
  cpufreq: schedutil: map raw required frequency to driver frequency
  cpufreq: add cpufreq_driver_resolve_freq()
  cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
  intel_pstate: Update cpu_frequency tracepoint every time
  cpufreq: intel_pstate: clean remnant struct element
  PM / tools: scripts: AnalyzeSuspend v4.2
  x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
  cpufreq: powernv: Replacing pstate_id with frequency table index
  intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
  PM / hibernate: Image data protection during restoration
  PM / hibernate: Add missing braces in __register_nosave_region()
  PM / hibernate: Clean up comments in snapshot.c
  PM / hibernate: Clean up function headers in snapshot.c
  PM / hibernate: Add missing braces in hibernate_setup()
  ...
2016-07-26 17:29:07 -07:00
Kirill A. Shutemov
dcddffd41d mm: do not pass mm_struct into handle_mm_fault
We always have vma->vm_mm around.

Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Vladimir Davydov
3e79ec7ddc arch: x86: charge page tables to kmemcg
Page tables can bite a relatively big chunk off system memory and their
allocations are easy to trigger from userspace, so they should be
accounted to kmemcg.

This patch marks page table allocations as __GFP_ACCOUNT for x86.  Note
we must not charge allocations of kernel page tables, because they can
be shared among processes from different cgroups so accounting them to a
particular one can pin other cgroups for indefinitely long.  So we clear
__GFP_ACCOUNT flag if a page table is allocated for the kernel.

Link: http://lkml.kernel.org/r/7d5c54f6a2bcbe76f03171689440003d87e6c742.1464079538.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Kees Cook
c965b105bf kbuild: abort build on bad stack protector flag
Before, the stack protector flag was sanity checked before .config had
been reprocessed.  This meant the build couldn't be aborted early, and
only a warning could be emitted followed later by the compiler blowing
up with an unknown flag.  This has caused a lot of confusion over time,
so this splits the flag selection from sanity checking and performs the
sanity checking after the make has been restarted from a reprocessed
.config, so builds can be aborted as early as possible now.

Additionally moves the x86-specific sanity check to the same location,
since it suffered from the same warn-then-wait-for-compiler-failure
problem.

Link: http://lkml.kernel.org/r/20160712223043.GA11664@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Kees Cook
228d96c603 kbuild: Abort build on bad stack protector flag
Before, the stack protector flag was sanity checked before .config had
been reprocessed. This meant the build couldn't be aborted early, and
only a warning could be emitted followed later by the compiler blowing
up with an unknown flag. This has caused a lot of confusion over time,
so this splits the flag selection from sanity checking and performs the
sanity checking after the make has been restarted from a reprocessed
.config, so builds can be aborted as early as possible now.

Additionally moves the x86-specific sanity check to the same location,
since it suffered from the same warn-then-wait-for-compiler-failure
problem.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-07-26 23:50:59 +02:00
Kees Cook
5b710f34e1 x86/uaccess: Enable hardened usercopy
Enables CONFIG_HARDENED_USERCOPY checks on x86. This is done both in
copy_*_user() and __copy_*_user() because copy_*_user() actually calls
down to _copy_*_user() and not __copy_*_user().

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
2016-07-26 14:41:48 -07:00
Kees Cook
0f60a8efe4 mm: Implement stack frame object validation
This creates per-architecture function arch_within_stack_frames() that
should validate if a given object is contained by a kernel stack frame.
Initial implementation is on x86.

This is based on code from PaX.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-07-26 14:41:47 -07:00
Linus Torvalds
bbce2ad2d7 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.8:

  API:
   - first part of skcipher low-level conversions
   - add KPP (Key-agreement Protocol Primitives) interface.

  Algorithms:
   - fix IPsec/cryptd reordering issues that affects aesni
   - RSA no longer does explicit leading zero removal
   - add SHA3
   - add DH
   - add ECDH
   - improve DRBG performance by not doing CTR by hand

  Drivers:
   - add x86 AVX2 multibuffer SHA256/512
   - add POWER8 optimised crc32c
   - add xts support to vmx
   - add DH support to qat
   - add RSA support to caam
   - add Layerscape support to caam
   - add SEC1 AEAD support to talitos
   - improve performance by chaining requests in marvell/cesa
   - add support for Araneus Alea I USB RNG
   - add support for Broadcom BCM5301 RNG
   - add support for Amlogic Meson RNG
   - add support Broadcom NSP SoC RNG"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits)
  crypto: vmx - Fix aes_p8_xts_decrypt build failure
  crypto: vmx - Ignore generated files
  crypto: vmx - Adding support for XTS
  crypto: vmx - Adding asm subroutines for XTS
  crypto: skcipher - add comment for skcipher_alg->base
  crypto: testmgr - Print akcipher algorithm name
  crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op
  crypto: nx - off by one bug in nx_of_update_msc()
  crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
  crypto: scatterwalk - Inline start/map/done
  crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
  crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
  crypto: scatterwalk - Fix test in scatterwalk_done
  crypto: api - Optimise away crypto_yield when hard preemption is on
  crypto: scatterwalk - add no-copy support to copychunks
  crypto: scatterwalk - Remove scatterwalk_bytes_sglen
  crypto: omap - Stop using crypto scatterwalk_bytes_sglen
  crypto: skcipher - Remove top-level givcipher interface
  crypto: user - Remove crypto_lookup_skcipher call
  crypto: cts - Convert to skcipher
  ...
2016-07-26 13:40:17 -07:00
Linus Torvalds
85802a49a8 This merge introduces three patches that are later reverted,
- Switching of MSR_TSC_AUX in SVM was thought to cause a host
    misbehavior, but it was later cleared of those doubts and the patch
    moved code to a hot path, so we reverted it.  That patch also needed
    a fix for 32 bit builds and both were reverted in one go.
 
  - Al Viro noticed that a fix for a leak in an error path was not valid
    with the given API and provided a better fix, so the original patch
    was reverted.
 
 Then there are two VMX fixes that move code around because VMCS was not
 accessed between vcpu_load() and vcpu_put(), a simple ARM VHE fix, and
 two one-liners for PML and MTRR.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCAAGBQJXljRwAAoJEED/6hsPKofoIZMIAIMm2h5HKplmpT007dVCt1zw
 dG8gO9hxOxstXfGVkNEZvdxyUb0ilFMO5AYySS1ctENpswVAZlKWlyc+aNGsHXOS
 KFylAUNHlibua1xgB64Sitgub8M9Ct5mDfvvqWL79aCgHTLcDxnb/0NprTqB2P3O
 TfsLaiKMDOeZs4nTcs62vNqpPJzoFc6DK2x1RltFGF9RpR7bOD7gnp7KypDWJx7S
 1LleWPHboxHQ40qf8dxAb7HwEARfXndlP6ZoCkf2stoWTwuexHJfesUnsNgEuXnX
 6YJ9mO7np/bHfSDpGMJbb9pPI5g7UDwOzmgvYQvzhak3LRmvjsZePpWchlb0yCs=
 =3VD4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM leftovers from Radim Krčmář:
 "This is a combination of two pull requests for 4.7-rc8 that were not
  merged due to looking hairy.  I have changed the tag message to focus
  on circumstances of contained reverts as they were likely the reason
  behind rejection.

  This merge introduces three patches that are later reverted,

   - Switching of MSR_TSC_AUX in SVM was thought to cause a host
     misbehavior, but it was later cleared of those doubts and the patch
     moved code to a hot path, so we reverted it.  That patch also
     needed a fix for 32 bit builds and both were reverted in one go.

   - Al Viro noticed that a fix for a leak in an error path was not
     valid with the given API and provided a better fix, so the original
     patch was reverted.

  Then there are two VMX fixes that move code around because VMCS was
  not accessed between vcpu_load() and vcpu_put(), a simple ARM VHE fix,
  and two one-liners for PML and MTRR"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  arm64: KVM: VHE: Context switch MDSCR_EL1
  KVM: VMX: handle PML full VMEXIT that occurs during event delivery
  Revert "KVM: SVM: fix trashing of MSR_TSC_AUX"
  KVM: SVM: do not set MSR_TSC_AUX on 32-bit builds
  KVM: don't use anon_inode_getfd() before possible failures
  Revert "KVM: release anon file in failure path of vm creation"
  KVM: release anon file in failure path of vm creation
  KVM: nVMX: Fix memory corruption when using VMCS shadowing
  kvm: vmx: ensure VMCS is current while enabling PML
  KVM: SVM: fix trashing of MSR_TSC_AUX
  KVM: MTRR: fix kvm_mtrr_check_gfn_range_consistency page fault
2016-07-26 11:50:42 -07:00
Borislav Petkov
efaad554b4 x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
CONFIG_RANDOMIZE_MEMORY=y randomizes the physical memmap and thus the
address where the initrd is located. Therefore, we need to add the
offset KASLR put us to in order to find the initrd again on the AP path.

In the future, we will get rid of the initrd address caching and query
the address on both the BSP and AP paths but that would need more work.

Thanks to Nicolai Stange for the good bisection and debugging work.

Reported-and-tested-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160726095138.3470-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-26 19:32:57 +02:00
Linus Torvalds
37e13a1ebe Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree contains tooling fixes plus some additions:

   - fixes to the vdso2c build environment that Stephen Rothwell is
     using for the linux-next build (Arnaldo Carvalho de Melo)

   - AVX-512 instruction mappings (Adrian Hunter)

   - misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "perf tools: event.h needs asm/perf_regs.h"
  x86: Make the vdso2c compiler use the host architecture headers
  tools build: Fix objtool build with ARCH=x86_64
  objtool: Always use host headers
  objtool: Use tools/scripts/Makefile.arch to get ARCH and HOSTARCH
  tools build: Add HOSTARCH Makefile variable
  perf tests kmod-path: Fix build on ubuntu:16.04-x-armhf
  perf tools: Add AVX-512 instructions to the new instructions test
  perf tools: Add AVX-512 support to the instruction decoder used by Intel PT
  x86/insn: Add AVX-512 support to the instruction decoder
  x86/insn: perf tools: Fix vcvtph2ps instruction decoding
2016-07-26 10:26:29 -07:00
Juergen Gross
d34c30cc1f xen: add static initialization of steal_clock op to xen_time_ops
pv_time_ops might be overwritten with xen_time_ops after the
steal_clock operation has been initialized already. To prevent calling
a now uninitialized function pointer add the steal_clock static
initialization to xen_time_ops.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-26 14:07:06 +01:00
Dave Airlie
5e580523d9 Linux 4.7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXlRXSAAoJEHm+PkMAQRiGG/gH/0Z8O4zWOsrwO+X1mRToRDBH
 joFOjAmCVe83T1VpF5LYNB+9+owL/dEDt6+ZIswnhH7AfQPjs4RqwS4PcuMbCDVO
 +mDm0PmfcKaYcQZrB2Z2OwIzRNnfCTVcsDPhIHwuIHk0m4z/xuGZonD8KoAj0+tO
 3yJF6sbE1KubDVjOb+lmZZSP3cXA0pDXrNhkYhE4Tsr8fiihGjeXSNJ8t2zPLjxo
 W3MPqo0rzDvQsOwoF4TWHHagVaFSJlhLBBgqu33fI7uO3jtfQD2G8wG68JCND1j3
 qbMoBfTLFV/yQmSIJUt0Wv1axaCcwnjpweEB35A/GEeZ0mNB1rDdoBeI1eKEQkc=
 =DGFC
 -----END PGP SIGNATURE-----

Backmerge tag 'v4.7' into drm-next

Linux 4.7

As requested by Daniel Vetter as the conflicts were getting messy.
2016-07-26 17:26:29 +10:00
Linus Torvalds
e65805251f Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The irq department delivers:

   - new core infrastructure to allow better management of multi-queue
     devices (interrupt spreading, node aware descriptor allocation ...)

   - a new interrupt flow handler to support the new fangled Intel VMD
     devices.

   - yet another new interrupt controller driver.

   - a series of fixes which addresses sparse warnings, missing
     includes, missing static declarations etc from Ben Dooks.

   - a fix for the error handling in the hierarchical domain allocation
     code.

   - the usual pile of small updates to core and driver code"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  genirq: Fix missing irq allocation affinity hint
  irqdomain: Fix irq_domain_alloc_irqs_recursive() error handling
  irq/Documentation: Correct result of echnoing 5 to smp_affinity
  MAINTAINERS: Remove Jiang Liu from irq domains
  genirq/msi: Fix broken debug output
  genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors
  genirq/msi: Make use of affinity aware allocations
  genirq: Use affinity hint in irqdesc allocation
  genirq: Add affinity hint to irq allocation
  genirq: Introduce IRQD_AFFINITY_MANAGED flag
  genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
  irqchip/s3c24xx: Fixup IO accessors for big endian
  irqchip/exynos-combiner: Fix usage of __raw IO
  irqdomain: Fix disposal of mappings for interrupt hierarchies
  irqchip/aspeed-vic: Add irq controller for Aspeed
  doc/devicetree: Add Aspeed VIC bindings
  x86/PCI/VMD: Use untracked irq handler
  genirq: Add untracked irq handler
  irqchip/mips-gic: Populate irq_domain names
  irqchip/gicv3-its: Implement two-level(indirect) device table support
  ...
2016-07-25 21:35:03 -07:00
Linus Torvalds
55392c4c06 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "This update provides the following changes:

   - The rework of the timer wheel which addresses the shortcomings of
     the current wheel (cascading, slow search for next expiring timer,
     etc).  That's the first major change of the wheel in almost 20
     years since Finn implemted it.

   - A large overhaul of the clocksource drivers init functions to
     consolidate the Device Tree initialization

   - Some more Y2038 updates

   - A capability fix for timerfd

   - Yet another clock chip driver

   - The usual pile of updates, comment improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
  tick/nohz: Optimize nohz idle enter
  clockevents: Make clockevents_subsys static
  clocksource/drivers/time-armada-370-xp: Fix return value check
  timers: Implement optimization for same expiry time in mod_timer()
  timers: Split out index calculation
  timers: Only wake softirq if necessary
  timers: Forward the wheel clock whenever possible
  timers/nohz: Remove pointless tick_nohz_kick_tick() function
  timers: Optimize collect_expired_timers() for NOHZ
  timers: Move __run_timers() function
  timers: Remove set_timer_slack() leftovers
  timers: Switch to a non-cascading wheel
  timers: Reduce the CPU index space to 256k
  timers: Give a few structs and members proper names
  hlist: Add hlist_is_singular_node() helper
  signals: Use hrtimer for sigtimedwait()
  timers: Remove the deprecated mod_timer_pinned() API
  timers, net/ipv4/inet: Initialize connection request timers as pinned
  timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
  timers, drivers/tty/metag_da: Initialize the poll timer as pinned
  ...
2016-07-25 20:43:12 -07:00
Linus Torvalds
c410614c90 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "Leftover fix from the v4.7 cycle: adds a reboot quirk"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Add Dell Optiplex 7450 AIO reboot quirk
2016-07-25 20:06:38 -07:00
Linus Torvalds
5f22004ba9 Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Ingo Molnar:
 "The main change in this tree is the reworking, fixing and extension of
  the TSC frequency enumeration code (by Len Brown)"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Remove the unused check_tsc_disabled()
  x86/tsc: Enumerate BXT tsc_khz via CPUID
  x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
  x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
  x86/tsc_msr: Add Airmont reference clock values
  x86/tsc_msr: Correct Silvermont reference clock values
  x86/tsc_msr: Update comments, expand definitions
  x86/tsc_msr: Remove debugging messages
  x86/tsc_msr: Identify Intel-specific code
  Revert "x86/tsc: Add missing Cherrytrail frequency to the table"
2016-07-25 19:41:35 -07:00
Linus Torvalds
8e466955d6 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Intel-SoC enhancements (Andy Shevchenko)

   - Intel CPU symbolic model definition rework (Dave Hansen)

   - ... other misc changes"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  x86/sfi: Enable enumeration of SD devices
  x86/pci: Use MRFLD abbreviation for Merrifield
  x86/platform/intel-mid: Make vertical indentation consistent
  x86/platform/intel-mid: Mark regulators explicitly defined
  x86/platform/intel-mid: Rename mrfl.c to mrfld.c
  x86/platform/intel-mid: Enable spidev on Intel Edison boards
  x86/platform/intel-mid: Extend PWRMU to support Penwell
  x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
  x86/platform/intel-mid: Add pinctrl for Intel Merrifield
  x86/platform/intel-mid: Enable GPIO expanders on Edison
  x86/platform/intel-mid: Add Power Management Unit driver
  x86/platform/atom/punit: Enable support for Merrifield
  x86/platform/intel_mid_pci: Rework IRQ0 workaround
  x86, thermal: Clean up and fix CPU model detection for intel_soc_dts_thermal
  x86, mmc: Use Intel family name macros for mmc driver
  x86/intel_telemetry: Use Intel family name macros for telemetry driver
  x86/acpi/lss: Use Intel family name macros for the acpi_lpss driver
  x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
  x86/platform: Use new Intel model number macros
  x86/intel_idle: Use Intel family macros for intel_idle
  ...
2016-07-25 19:15:35 -07:00
Linus Torvalds
2d724ffddd Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "The main x86 FPU changes in this cycle were:

   - a large series of cleanups, fixes and enhancements to re-enable the
     XSAVES instruction on Intel CPUs - which is the most advanced
     instruction to do FPU context switches (Yu-cheng Yu, Fenghua Yu)

   - Add FPU tracepoints for the FPU state machine (Dave Hansen)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Do not BUG_ON() in early FPU code
  x86/fpu/xstate: Re-enable XSAVES
  x86/fpu/xstate: Fix fpstate_init() for XRSTORS
  x86/fpu/xstate: Return NULL for disabled xstate component address
  x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES
  x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates
  x86/fpu/xstate: Fix XSTATE component offset print out
  x86/fpu/xstate: Fix PTRACE frames for XSAVES
  x86/fpu/xstate: Fix supervisor xstate component offset
  x86/fpu/xstate: Align xstate components according to CPUID
  x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use
  x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
  x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
  x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
  x86/fpu: Add tracepoints to dump FPU state at key points
2016-07-25 18:48:27 -07:00
Linus Torvalds
36e635cb21 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 stackdump update from Ingo Molnar:
 "A number of stackdump enhancements"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/dumpstack: Add show_stack_regs() and use it
  printk: Make the printk*once() variants return a value
  x86/dumpstack: Honor supplied @regs arg
2016-07-25 18:18:04 -07:00
Linus Torvalds
c265cc5c3c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Three small cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lguest: Read offset of device_cap later
  lguest: Read length of device_cap later
  x86: Do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
2016-07-25 18:06:00 -07:00
Linus Torvalds
80f09cf5c1 Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
 "A build system fix and a cleanup"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kbuild: Remove stale asm-generic wrappers
  kbuild, x86: Track generated headers with generated-y
2016-07-25 18:00:18 -07:00
Linus Torvalds
77cd3d0c43 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main changes:

   - add initial commits to randomize kernel memory section virtual
     addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
     (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)

   - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
     Cook)

   - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)

   - misc cleanups/fixes"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Simplify EBDA-vs-BIOS reservation logic
  x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
  x86/boot: Reorganize and clean up the BIOS area reservation code
  x86/mm: Do not reference phys addr beyond kernel
  x86/mm: Add memory hotplug support for KASLR memory randomization
  x86/mm: Enable KASLR for vmalloc memory regions
  x86/mm: Enable KASLR for physical mapping memory regions
  x86/mm: Implement ASLR for kernel memory regions
  x86/mm: Separate variable for trampoline PGD
  x86/mm: Add PUD VA support for physical mapping
  x86/mm: Update physical mapping variable names
  x86/mm: Refactor KASLR entropy functions
  x86/KASLR: Fix boot crash with certain memory configurations
  x86/boot/64: Add forgotten end of function marker
  x86/KASLR: Allow randomization below the load address
  x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
  x86/KASLR: Randomize virtual address separately
  x86/KASLR: Clarify identity map interface
  x86/boot: Refuse to build with data relocations
  x86/KASLR, x86/power: Remove x86 hibernation restrictions
2016-07-25 17:32:28 -07:00
Linus Torvalds
0f657262d5 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "Various x86 low level modifications:

   - preparatory work to support virtually mapped kernel stacks (Andy
     Lutomirski)

   - support for 64-bit __get_user() on 32-bit kernels (Benjamin
     LaHaise)

   - (involved) workaround for Knights Landing CPU erratum (Dave Hansen)

   - MPX enhancements (Dave Hansen)

   - mremap() extension to allow remapping of the special VDSO vma, for
     purposes of user level context save/restore (Dmitry Safonov)

   - hweight and entry code cleanups (Borislav Petkov)

   - bitops code generation optimizations and cleanups with modern GCC
     (H. Peter Anvin)

   - syscall entry code optimizations (Paolo Bonzini)"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  x86/mm/cpa: Add missing comment in populate_pdg()
  x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
  x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
  x86/smp: Remove unnecessary initialization of thread_info::cpu
  x86/smp: Remove stack_smp_processor_id()
  x86/uaccess: Move thread_info::addr_limit to thread_struct
  x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
  x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
  x86/dumpstack: When OOPSing, rewind the stack before do_exit()
  x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
  x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
  x86/dumpstack: Try harder to get a call trace on stack overflow
  x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
  x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
  x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
  x86/mm: Use pte_none() to test for empty PTE
  x86/mm: Disallow running with 32-bit PTEs to work around erratum
  x86/mm: Ignore A/D bits in pte/pmd/pud_none()
  x86/mm: Move swap offset/type up in PTE to work around erratum
  x86/entry: Inline enter_from_user_mode()
  ...
2016-07-25 15:34:18 -07:00
Linus Torvalds
425dbc6db3 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic updates from Ingo Molnar:
 "Misc cleanups and a small fix"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Remove the unused struct apic::apic_id_mask field
  x86/apic: Fix misspelled APIC
  x86/ioapic: Simplify ioapic_setup_resources()
2016-07-25 15:09:08 -07:00
Linus Torvalds
cca08cd66c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - introduce and use task_rcu_dereference()/try_get_task_struct() to fix
   and generalize task_struct handling (Oleg Nesterov)

 - do various per entity load tracking (PELT) fixes and optimizations
   (Peter Zijlstra)

 - cputime virt-steal time accounting enhancements/fixes (Wanpeng Li)

 - introduce consolidated cputime output file cpuacct.usage_all and
   related refactorings (Zhao Lei)

 - ... plus misc fixes and enhancements

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Panic on scheduling while atomic bugs if kernel.panic_on_warn is set
  sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together
  sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()
  sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums
  sched/fair: Rework throttle_count sync
  sched/core: Fix sched_getaffinity() return value kerneldoc comment
  sched/fair: Reorder cgroup creation code
  sched/fair: Apply more PELT fixes
  sched/fair: Fix PELT integrity for new tasks
  sched/cgroup: Fix cpu_cgroup_fork() handling
  sched/fair: Fix PELT integrity for new groups
  sched/fair: Fix and optimize the fork() path
  sched/cputime: Add steal time support to full dynticks CPU time accounting
  sched/cputime: Fix prev steal time accouting during CPU hotplug
  KVM: Fix steal clock warp during guest CPU hotplug
  sched/debug: Always show 'nr_migrations'
  sched/fair: Use task_rcu_dereference()
  sched/api: Introduce task_rcu_dereference() and try_get_task_struct()
  sched/idle: Optimize the generic idle loop
  sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()
2016-07-25 13:59:34 -07:00
Linus Torvalds
7e4dc77b28 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "With over 300 commits it's been a busy cycle - with most of the work
  concentrated on the tooling side (as it should).

  The main kernel side enhancements were:

   - Add per event callchain limit: Recently we introduced a sysctl to
     tune the max-stack for all events for which callchains were
     requested:

       $ sysctl kernel.perf_event_max_stack
       kernel.perf_event_max_stack = 127

     Now this patch introduces a way to configure this per event, i.e.
     this becomes possible:

       $ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a

     allowing finer tuning of how much buffer space callchains use.

     This uses an u16 from the reserved space at the end, leaving
     another u16 for future use.

     There has been interest in even finer tuning, namely to control the
     max stack for kernel and userspace callchains separately.  Further
     discussion is needed, we may for instance use the remaining u16 for
     that and when it is present, assume that the sample_max_stack
     introduced in this patch applies for the kernel, and the u16 left
     is used for limiting the userspace callchain (Arnaldo Carvalho de
     Melo)

   - Optimize AUX event (hardware assisted side-band event) delivery
     (Kan Liang)

   - Rework Intel family name macro usage (this is partially x86 arch
     work) (Dave Hansen)

   - Refine and fix Intel LBR support (David Carrillo-Cisneros)

   - Add support for Intel 'TopDown' events (Andi Kleen)

   - Intel uncore PMU driver fixes and enhancements (Kan Liang)

   - ... other misc changes.

  Here's an incomplete list of the tooling enhancements (but there's
  much more, see the shortlog and the git log for details):

   - Support cross unwinding, i.e.  collecting '--call-graph dwarf'
     perf.data files in one machine and then doing analysis in another
     machine of a different hardware architecture.  This enables, for
     instance, to do:

       $ perf record -a --call-graph dwarf

     on a x86-32 or aarch64 system and then do 'perf report' on it on a
     x86_64 workstation (He Kuang)

   - Allow reading from a backward ring buffer (one setup via
     sys_perf_event_open() with perf_event_attr.write_backward = 1)
     (Wang Nan)

   - Finish merging initial SDT (Statically Defined Traces) support, see
     cset comments for details about how it all works (Masami Hiramatsu)

   - Support attaching eBPF programs to tracepoints (Wang Nan)

   - Add demangling of symbols in programs written in the Rust language
     (David Tolnay)

   - Add support for tracepoints in the python binding, including an
     example, that sets up and parses sched:sched_switch events,
     tools/perf/python/tracepoint.py (Jiri Olsa)

   - Introduce --stdio-color to set up the color output mode selection
     in 'annotate' and 'report', allowing emit color escape sequences
     when redirecting the output of these tools (Arnaldo Carvalho de
     Melo)

   - Add 'callindent' option to 'perf script -F', to indent the Intel PT
     call stack, making this output more ftrace-like (Adrian Hunter,
     Andi Kleen)

   - Allow dumping the object files generated by llvm when processing
     eBPF scriptlet events (Wang Nan)

   - Add stackcollapse.py script to help generating flame graphs (Paolo
     Bonzini)

   - Add --ldlat option to 'perf mem' to specify load latency for loads
     event (e.g. cpu/mem-loads/ ) (Jiri Olsa)

   - Tooling support for Intel TopDown counters, recently added to the
     kernel (Andi Kleen)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
  perf tests: Add is_printable_array test
  perf tools: Make is_printable_array global
  perf script python: Fix string vs byte array resolving
  perf probe: Warn unmatched function filter correctly
  perf cpu_map: Add more helpers
  perf stat: Balance opening and reading events
  tools: Copy linux/{hash,poison}.h and check for drift
  perf tools: Remove include/linux/list.h from perf's MANIFEST
  tools: Copy the bitops files accessed from the kernel and check for drift
  Remove: kernel unistd*h files from perf's MANIFEST, not used
  perf tools: Remove tools/perf/util/include/linux/const.h
  perf tools: Remove tools/perf/util/include/asm/byteorder.h
  perf tools: Add missing linux/compiler.h include to perf-sys.h
  perf jit: Remove some no-op error handling
  perf jit: Add missing curly braces
  objtool: Initialize variable to silence old compiler
  objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
  perf record: Add --tail-synthesize option
  perf session: Don't warn about out of order event if write_backward is used
  perf tools: Enable overwrite settings
  ...
2016-07-25 13:20:41 -07:00
Linus Torvalds
89e7eb098a Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "The biggest change in this cycle was an enhancement by Yazen Ghannam
  to reduce the number of MCE error injection related IPIs.

  The rest are smaller fixes"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix mce_rdmsrl() warning message
  x86/RAS/AMD: Reduce the number of IPIs when prepping error injection
  x86/mce/AMD: Increase size of the bank_map type
  x86/mce: Do not use bank 1 for APEI generated error logs
2016-07-25 13:13:19 -07:00
Linus Torvalds
c86ad14d30 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The locking tree was busier in this cycle than the usual pattern - a
  couple of major projects happened to coincide.

  The main changes are:

   - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
     across all SMP architectures (Peter Zijlstra)

   - add atomic_fetch_{inc/dec}() as well, using the generic primitives
     (Davidlohr Bueso)

   - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
     Waiman Long)

   - optimize smp_cond_load_acquire() on arm64 and implement LSE based
     atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
     on arm64 (Will Deacon)

   - introduce smp_acquire__after_ctrl_dep() and fix various barrier
     mis-uses and bugs (Peter Zijlstra)

   - after discovering ancient spin_unlock_wait() barrier bugs in its
     implementation and usage, strengthen its semantics and update/fix
     usage sites (Peter Zijlstra)

   - optimize mutex_trylock() fastpath (Peter Zijlstra)

   - ... misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
  locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
  locking/static_keys: Fix non static symbol Sparse warning
  locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
  locking/atomic, arch/tile: Fix tilepro build
  locking/atomic, arch/m68k: Remove comment
  locking/atomic, arch/arc: Fix build
  locking/Documentation: Clarify limited control-dependency scope
  locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
  locking/atomic, arch/mips: Convert to _relaxed atomics
  locking/atomic, arch/alpha: Convert to _relaxed atomics
  locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
  locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
  locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
  locking/atomic: Fix atomic64_relaxed() bits
  locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  ...
2016-07-25 12:41:29 -07:00
Linus Torvalds
a2303849a6 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The biggest change in this cycle were SGI/UV related changes that
  clean up and fix UV boot quirks and problems.

  There's also various smaller cleanups and refinements"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Reorganize the GUID table to make it easier to read
  x86/efi: Remove the unused efi_get_time() function
  x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
  x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
  efi: Convert efi_call_virt() to efi_call_virt_pointer()
  x86/efi: Remove unused variable 'efi'
  efi: Document #define FOO_PROTOCOL_GUID layout
  efibc: Report more information in the error messages
2016-07-25 12:30:01 -07:00
Stephen Rothwell
d51306f1a3 x86: Make the vdso2c compiler use the host architecture headers
To be clear: this is a ppc64le hosted, x86_64 target cross build.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160723150845.3af8e452@canb.auug.org.au
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-25 10:02:21 -03:00
Vitaly Kuznetsov
ee42d665d3 xen/pvhvm: run xen_vcpu_setup() for the boot CPU
Historically we didn't call VCPUOP_register_vcpu_info for CPU0 for
PVHVM guests (while we had it for PV and ARM guests). This is usually
fine as we can use vcpu info in the shared_info page but when we try
booting on a vCPU with Xen's vCPU id > 31 (e.g. when we try to kdump
after crashing on this CPU) we're not able to boot.

Switch to always doing VCPUOP_register_vcpu_info for the boot CPU.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25 13:34:23 +01:00
Vitaly Kuznetsov
e15a862193 x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
shared_info page has space for 32 vcpu info slots for first 32 vCPUs
but these are the first 32 vCPUs from Xen's perspective and we should
map them accordingly with the newly introduced xen_vcpu_id mapping.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25 13:33:40 +01:00
Vitaly Kuznetsov
ad5475f9fa x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
HYPERVISOR_vcpu_op() passes Linux's idea of vCPU id as a parameter
while Xen's idea is expected. In some cases these ideas diverge so we
need to do remapping.

Convert all callers of HYPERVISOR_vcpu_op() to use xen_vcpu_nr().

Leave xen_fill_possible_map() and xen_filter_cpu_maps() intact as
they're only being called by PV guests before perpu areas are
initialized. While the issue could be solved by switching to
early_percpu for xen_vcpu_id I think it's not worth it: PV guests will
probably never get to the point where their idea of vCPU id diverges
from Xen's.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25 13:32:34 +01:00
Vitaly Kuznetsov
88e957d6e4 xen: introduce xen_vcpu_id mapping
It may happen that Xen's and Linux's ideas of vCPU id diverge. In
particular, when we crash on a secondary vCPU we may want to do kdump
and unlike plain kexec where we do migrate_to_reboot_cpu() we try
booting on the vCPU which crashed. This doesn't work very well for
PVHVM guests as we have a number of hypercalls where we pass vCPU id
as a parameter. These hypercalls either fail or do something
unexpected.

To solve the issue introduce percpu xen_vcpu_id mapping. ARM and PV
guests get direct mapping for now. Boot CPU for PVHVM guest gets its
id from CPUID. With secondary CPUs it is a bit more
trickier. Currently, we initialize IPI vectors before these CPUs boot
so we can't use CPUID. Use ACPI ids from MADT instead.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25 13:31:16 +01:00
Vitaly Kuznetsov
3e9e57fad3 x86/acpi: store ACPI ids from MADT for future usage
Currently we don't save ACPI ids (unlike LAPIC ids which go to
x86_cpu_to_apicid) from MADT and we may need this information later.
Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
to figure out Xen's idea of its vCPUs ids before these CPUs boot and
in some cases these ids diverge from Linux's cpu ids.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25 13:30:53 +01:00
Vitaly Kuznetsov
de2f5537b3 x86/xen: update cpuid.h from Xen-4.7
Update cpuid.h header from xen hypervisor tree to get
XEN_HVM_CPUID_VCPU_ID_PRESENT definition.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25 13:30:45 +01:00
Rafael J. Wysocki
bc841e260c Merge branch 'pm-cpu'
* pm-cpu:
  x86: remove duplicate turbo ratio limit MSRs
  tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT
  cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
2016-07-25 13:46:30 +02:00
Rafael J. Wysocki
9fedbb3b6b Merge branch 'x86/cpu' from tip 2016-07-25 13:45:39 +02:00
Rafael J. Wysocki
7f234a4d8a Merge branches 'pm-sleep' and 'pm-tools'
* pm-sleep:
  PM / hibernate: Introduce test_resume mode for hibernation
  x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
  PM / hibernate: Image data protection during restoration
  PM / hibernate: Add missing braces in __register_nosave_region()
  PM / hibernate: Clean up comments in snapshot.c
  PM / hibernate: Clean up function headers in snapshot.c
  PM / hibernate: Add missing braces in hibernate_setup()
  PM / hibernate: Recycle safe pages after image restoration
  PM / hibernate: Simplify mark_unsafe_pages()
  PM / hibernate: Do not free preallocated safe pages during image restore
  PM / suspend: show workqueue state in suspend flow
  PM / sleep: make PM notifiers called symmetrically
  PM / sleep: Make pm_prepare_console() return void
  PM / Hibernate: Don't let kasan instrument snapshot.c

* pm-tools:
  PM / tools: scripts: AnalyzeSuspend v4.2
  tools/turbostat: allow user to alter DESTDIR and PREFIX
2016-07-25 13:44:32 +02:00
Rafael J. Wysocki
d5f017b796 Merge branch 'acpi-tables'
* acpi-tables:
  ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
  ACPI: add support for loading SSDTs via configfs
  ACPI: add support for configfs
  efi / ACPI: load SSTDs from EFI variables
  spi / ACPI: add support for ACPI reconfigure notifications
  i2c / ACPI: add support for ACPI reconfigure notifications
  ACPI: add support for ACPI reconfiguration notifiers
  ACPI / scan: fix enumeration (visited) flags for bus rescans
  ACPI / documentation: add SSDT overlays documentation
  ACPI: ARM64: support for ACPI_TABLE_UPGRADE
  ACPI / tables: introduce ARCH_HAS_ACPI_TABLE_UPGRADE
  ACPI / tables: move arch-specific symbol to asm/acpi.h
  ACPI / tables: table upgrade: refactor function definitions
  ACPI / tables: table upgrade: use cacheable map for tables

Conflicts:
	arch/arm64/include/asm/acpi.h
2016-07-25 13:41:01 +02:00
Rafael J. Wysocki
d85f4eb699 Merge branch 'acpi-numa'
* acpi-numa:
  ACPI / NUMA: Enable ACPI based NUMA on ARM64
  arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT
  ACPI / processor: Add acpi_map_madt_entry()
  ACPI / NUMA: Improve SRAT error detection and add messages
  ACPI / NUMA: Move acpi_numa_memory_affinity_init() to drivers/acpi/numa.c
  ACPI / NUMA: remove unneeded acpi_numa=1
  ACPI / NUMA: move bad_srat() and srat_disabled() to drivers/acpi/numa.c
  x86 / ACPI / NUMA: cleanup acpi_numa_processor_affinity_init()
  arm64, NUMA: Cleanup NUMA disabled messages
  arm64, NUMA: rework numa_add_memblk()
  ACPI / NUMA: move acpi_numa_slit_init() to drivers/acpi/numa.c
  ACPI / NUMA: Move acpi_numa_arch_fixup() to ia64 only
  ACPI / NUMA: remove duplicate NULL check
  ACPI / NUMA: Replace ACPI_DEBUG_PRINT() with pr_debug()
  ACPI / NUMA: Use pr_fmt() instead of printk
2016-07-25 13:40:39 +02:00
Dan Williams
0606263f24 Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-next 2016-07-24 08:05:44 -07:00
David S. Miller
de0ba9a0d8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just several instances of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 00:53:32 -04:00
Andy Lutomirski
55920d31f1 x86/mm/cpa: Add missing comment in populate_pdg()
In commit:

  21cbc2822aa1 ("x86/mm/cpa: Unbreak populate_pgd(): stop trying to deallocate failed PUDs")

I intended to add this comment, but I failed at using git.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/242baf8612394f4e31216f96d13c4d2e9b90d1b7.1469293159.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-23 21:17:10 +02:00
Andy Lutomirski
530dd8d4b9 x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
Valdis Kletnieks bisected a boot failure back to this recent commit:

  360cb4d155 ("x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated")

I broke the case where a PUD table got allocated -- populate_pud()
would wander off a pgd_none entry and get lost.  I'm not sure how
this survived my testing.

Fix the original issue in a much simpler way.  The problem
was that, if we allocated a PUD table, failed to populate it, and
freed it, another CPU could potentially keep using the PGD entry we
installed (either by copying it via vmalloc_fault or by speculatively
caching it).  There's a straightforward fix: simply leave the
top-level entry in place if this happens.  This can't waste any
significant amount of memory -- there are at most 256 entries like
this systemwide and, as a practical matter, if we hit this failure
path repeatedly, we're likely to reuse the same page anyway.

For context, this is a reversion with this hunk added in:

	if (ret < 0) {
+		/*
+		 * Leave the PUD page in place in case some other CPU or thread
+		 * already found it, but remove any useless entries we just
+		 * added to it.
+		 */
-		unmap_pgd_range(cpa->pgd, addr,
+		unmap_pud_range(pgd_entry, addr,
			        addr + (cpa->numpages << PAGE_SHIFT));
		return ret;
	}

This effectively open-codes what the now-deleted unmap_pgd_range()
function used to do except that unmap_pgd_range() used to try to
free the page as well.

Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/21cbc2822aa18aa812c0215f4231dbf5f65afa7f.1469249789.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-23 21:13:25 +02:00
Dan Williams
fd1d961dd6 x86/insn: remove pcommit
The pcommit instruction is being deprecated in favor of either ADR
(asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
Firmware Interface Table).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23 11:04:23 -07:00
Dan Williams
dfa169bbee Revert "KVM: x86: add pcommit support"
This reverts commit 8b3e34e46a.

Given the deprecation of the pcommit instruction, the relevant VMX
features and CPUID bits are not going to be rolled into the SDM.  Remove
their usage from KVM.

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23 11:04:23 -07:00
Eric Auger
d9565a7399 KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header
kvm_setup_default_irq_routing and kvm_setup_empty_irq_routing are
not used by generic code. So let's move the declarations in x86 irq.h
header instead of kvm_host.h.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-22 18:52:00 +01:00
Andy Lutomirski
6a79296cb1 x86/boot: Simplify EBDA-vs-BIOS reservation logic
Both the intent and the effect of reserve_bios_regions() is simple:
reserve the range from the apparent BIOS start (suitably filtered)
through 1MB and, if the EBDA start address is sensible, extend that
reservation downward to cover the EBDA as well.

The code is overcomplicated, though, and contains head-scratchers
like:

	if (ebda_start < BIOS_START_MIN)
		ebda_start = BIOS_START_MAX;

That snipped is trying to say "if ebda_start < BIOS_START_MIN,
ignore it".

Simplify it: reorder the code so that it makes sense.  This should
have no functional effect under any circumstances.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/ef89c0c761be20ead8bd9a3275743e6259b6092a.1469135598.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-22 11:46:01 +02:00
Andy Lutomirski
30f027398b x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
It doesn't just control probing for the EBDA -- it controls whether we
detect and reserve the <1MB BIOS regions in general.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/55bd591115498440d461857a7b64f349a5d911f3.1469135598.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-22 11:46:01 +02:00
Dave Hansen
ec3ed4a210 x86/fpu: Do not BUG_ON() in early FPU code
I don't think it is really possible to have a system where CPUID
enumerates support for XSAVE but that it does not have FP/SSE
(they are "legacy" features and always present).

But, I did manage to hit this case in qemu when I enabled its
somewhat shaky XSAVE support.  The bummer is that the FPU is set
up before we parse the command-line or have *any* console support
including earlyprintk.  That turned what should have been an easy
thing to debug in to a bit more of an odyssey.

So a BUG() here is worthless.  All it does it guarantee that
if/when we hit this case we have an empty console.  So, remove
the BUG() and try to limp along by disabling XSAVE and trying to
continue.  Add a comment on why we are doing this, and also add
a common "out_disable" path for leaving fpu__init_system_xstate().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160720194551.63BB2B58@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-21 18:18:45 +02:00
Adrian Hunter
25af37f4e1 x86/insn: Add AVX-512 support to the instruction decoder
Add support for Intel's AVX-512 instructions to the instruction decoder.

AVX-512 instructions are documented in Intel Architecture Instruction
Set Extensions Programming Reference (February 2016).

AVX-512 instructions are identified by a EVEX prefix which, for the
purpose of instruction decoding, can be treated as though it were a
4-byte VEX prefix.

Existing instructions which can now accept an EVEX prefix need not be
further annotated in the op code map (x86-opcode-map.txt). In the case
of new instructions, the op code map is updated accordingly.

Also add associated Mask Instructions that are used to manipulate mask
registers used in AVX-512 instructions.

The 'perf tools' instruction decoder is updated in a subsequent patch.
And a representative set of instructions is added to the perf tools new
instructions test in a subsequent patch.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: http://lkml.kernel.org/r/1469003437-32706-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21 09:37:11 -03:00
Ingo Molnar
edce21216a x86/boot: Reorganize and clean up the BIOS area reservation code
So the reserve_ebda_region() code has accumulated a number of
problems over the years that make it really difficult to read
and understand:

- The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily
  interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks...

- 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other
  parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it
  super confusing to read.

- It does not help at all that we have various memory range markers, half of which
  are 'start of range', half of which are 'end of range' - but this crucial
  property is not obvious in the naming at all ... gave me a headache trying to
  understand all this.

- Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is
  obvious, all values here are addresses!), while it does not highlight that it's
  the _start_ of the EBDA region ...

- 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
  that is a pointer to a value, not a memory range address!

- The function name itself is a misnomer: it says 'reserve_ebda_region()' while
  its main purpose is to reserve all the firmware ROM typically between 640K and
  1MB, while the 'EBDA' part is only a small part of that ...

- Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this
  too should be about whether to reserve firmware areas in the paravirt case.

- In fact thinking about this as 'end of RAM' is confusing: what this function
  *really* wants to reserve is firmware data and code areas! Once the thinking is
  inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure
  'reserved area' notion everything becomes a lot clearer.

To improve all this rewrite the whole code (without changing the logic):

- Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start'
  and propagate this concept through all the variable names and constants.

	BIOS_RAM_SIZE_KB_PTR		// was: BIOS_LOWMEM_KILOBYTES

	BIOS_START_MIN			// was: INSANE_CUTOFF

	ebda_start			// was: ebda_addr
	bios_start			// was: lowmem

	BIOS_START_MAX			// was: LOWMEM_CAP

- Then clean up the name of the function itself by renaming it
  to reserve_bios_regions() and renaming the ::ebda_search paravirt
  flag to ::reserve_bios_regions.

- Fix up all the comments (fix typos), harmonize and simplify their
  formulation and remove comments that become unnecessary due to
  the much better naming all around.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-21 10:11:57 +02:00
Adrian Hunter
6f6ef07f41 x86/insn: perf tools: Fix vcvtph2ps instruction decoding
vcvtph2ps does not have an immediate operand, so remove the erroneous
'Ib' from its opcode map entry. Add vcvtph2ps to the perf tools new
instructions test to verify it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: http://lkml.kernel.org/r/1469003437-32706-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-20 09:57:46 -03:00
Peter Zijlstra
08e237fa56 x86/cpu: Add workaround for MONITOR instruction erratum on Goldmont based CPUs
Monitored cached line may not wake up from mwait on certain
Goldmont based CPUs. This patch will avoid calling
current_set_polling_and_test() and thereby not set the TIF_ flag.
The result is that we'll always send IPIs for wakeups.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468867270-18493-1-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-20 09:48:40 +02:00
Ingo Molnar
4fffe71dd9 Merge branch 'linus' into x86/cpu, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-20 09:46:42 +02:00
Paul Gortmaker
a47177d360 x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
In commit:

  eb008eb6f8 ("x86: Audit and remove any remaining unnecessary uses of module.h")

... we looked for instances of module.h that were not supporting anything
more than exported symbols.

To facilitate the exchange of module.h to the much smaller export.h
we occasionally remove tags like MODULE_AUTHOR() etc. which in the case
of built in files, are no-ops and hence that is fine, assuming the
info is already in the comments at the top of the file..

However the error here is that I overlooked that this file was used
not as a driver, but as a library of functions, and hence has no
explicit modular linkage functions or similar, making it _appear_
non-modular.  We can see that in retrospect with:

  arch/x86/crypto/Makefile:obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o

  crypto/Kconfig:config CRYPTO_GLUE_HELPER_X86
  crypto/Kconfig: tristate

Since we removed what was an active MODULE_LICENSE(), the module failed
to load and then automated testing showed the missing glue helpers as:

  glue_helper: Unknown symbol blkcipher_walk_done (err 0)
  glue_helper: Unknown symbol blkcipher_walk_virt (err 0)
  glue_helper: Unknown symbol kernel_fpu_end (err 0)
  glue_helper: Unknown symbol kernel_fpu_begin (err 0)
  glue_helper: Unknown symbol blkcipher_walk_virt_block (err 0)

So we do a partial revert of that change to just this one file, and
watch for similar MODULE_LICENSE() only cases in future audits.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Cc: lkp@01.org
Fixes: eb008eb6f8 ("x86: Audit and remove any remaining unnecessary uses of module.h")
Link: http://lkml.kernel.org/r/20160719144243.GK21225@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-20 09:39:50 +02:00
Wei Yongjun
f6329088b3 x86/apic: Remove duplicated include from probe_64.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1468929740-8999-1-git-send-email-weiyj_lk@163.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-19 16:02:31 +02:00
Wei Yongjun
2384d1d832 x86/ce4100: Remove duplicated include from ce4100.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1468929731-8900-1-git-send-email-weiyj_lk@163.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-19 16:02:17 +02:00
Stephen Rothwell
a203800d97 x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-next@vger.kernel.org
Fixes: 186f43608a ("x86/kernel: Audit and remove any unnecessary uses of module.h")
Link: http://lkml.kernel.org/r/20160718182922.7b41f923@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-19 09:59:07 +02:00
Andy Lutomirski
57f90c3dfc x86/vdso: Error out if the vDSO isn't a valid DSO
Some distros has been playing with toolchain changes that can affect
the type of ELF objects built.  Occasionally, this goes wrong and
the vDSO ends up not being a DSO at all.  This causes the kernel to
end up broken in a surprisingly subtle way -- glibc apparently
silently ignores a vDSO that isn't a DSO, so everything works,
albeit slowly, until users try a different libc implementation.

Make the kernel build process a bit more robust: fail outright if
the vDSO isn't ET_DYN or is missing its PT_DYNAMIC segment.  I've
never seen this in an unmodified kernel.

See: https://github.com/docker/docker/issues/23378

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8a30e0a07c3b47ff917a8daa2df5e407cc0c6698.1468878336.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-19 08:50:24 +02:00
Arnd Bergmann
58ab5e0c2c Kbuild: arch: look for generated headers in obtree
There are very few files that need add an -I$(obj) gcc for the preprocessor
or the assembler. For C files, we add always these for both the objtree and
srctree, but for the other ones we require the Makefile to add them, and
Kbuild then adds it for both trees.

As a preparation for changing the meaning of the -I$(obj) directive to
only refer to the srctree, this changes the two instances in arch/x86 to use
an explictit $(objtree) prefix where needed, otherwise we won't find the
headers any more, as reported by the kbuild 0day builder.

arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-07-18 21:31:35 +02:00
Paul Gortmaker
00677f826b x86/platform: Delete extraneous MODULE_* tags fromm ts5500
This file doesn't do anything modular and hence while the tristate
Kconfig used for the gpio portion is fine, it recently got swept up in
an audit of files using the module.h header but not using any modular
registration functions.

However it is not compiled in any of the normal build coverage, and
so some remaining extraneous MODULE macro use were not found until a
randconfig from the kbuild robot came across it.

Here we remove the remaining no-op MODULE macros from the built in
portion of code relating to this Kconfig option.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Savoir-faire Linux Inc. <kernel@savoirfairelinux.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild-all@01.org
Cc: linux-kernel@vger.kernel.org
Fixes: cc3ae7b0af ("x86/platform: Audit and remove any unnecessary uses of module.h")
Link: http://lkml.kernel.org/r/20160715235318.GD10758@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-16 22:41:00 +02:00
Cao, Lei
b244c9fc25 KVM: VMX: handle PML full VMEXIT that occurs during event delivery
With PML enabled, guest will shut down if a PML full VMEXIT occurs during
event delivery. According to Intel SDM 27.2.3, PML full VMEXIT can occur when
event is being delivered through IDT, so KVM should not exit to user space
with error. Instead, it should let EXIT_REASON_PML_FULL go through and the
event will be re-injected on the next VMENTRY.

Signed-off-by: Lei Cao <lei.cao@stratus.com>
Cc: stable@vger.kernel.org
Fixes: 843e433057 ("KVM: VMX: Add PML support in VMX")
[Shortened the summary and Cc'd stable.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-07-16 15:27:40 +02:00
Radim Krčmář
6a907cd0ad Revert "KVM: SVM: fix trashing of MSR_TSC_AUX"
This reverts commit 9770404a00.

The reverted patch is not needed as only userspace uses RDTSCP and
MSR_TSC_AUX is in host_save_user_msrs[] and therefore properly saved in
svm_vcpu_load() and restored in svm_vcpu_put() before every switch to
userspace.

The reverted patch did not allow the kernel to use RDTSCP in the future,
because of missed trashing in svm_set_msr() and 64-bit ifdef.

This reverts commit 2b23c3a6e3.

2b23c3a6e3 ("KVM: SVM: do not set MSR_TSC_AUX on 32-bit builds") is a
build fix for 9770404a00 and reverting them separately would only
break more bisections.

Cc: stable@vger.kernel.org
2016-07-16 15:26:54 +02:00
Daniel Borkmann
7e3f977edd perf, events: add non-linear data support for raw records
This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.

If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.

This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.

The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].

  [1] http://thread.gmane.org/gmane.linux.network/421294

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:23:56 -07:00
Rafael J. Wysocki
406f992e4a x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
On Intel hardware, native_play_dead() uses mwait_play_dead() by
default and only falls back to the other methods if that fails.
That also happens during resume from hibernation, when the restore
(boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
except for the boot one offline.

However, that is problematic, because the address passed to
__monitor() in mwait_play_dead() is likely to be written to in the
last phase of hibernate image restoration and that causes the "dead"
CPU to start executing instructions again.  Unfortunately, the page
containing the address in that CPU's instruction pointer may not be
valid any more at that point.

First, that page may have been overwritten with image kernel memory
contents already, so the instructions the CPU attempts to execute may
simply be invalid.  Second, the page tables previously used by that
CPU may have been overwritten by image kernel memory contents, so the
address in its instruction pointer is impossible to resolve then.

A report from Varun Koyyalagunta and investigation carried out by
Chen Yu show that the latter sometimes happens in practice.

To prevent it from happening, temporarily change the smp_ops.play_dead
pointer during resume from hibernation so that it points to a special
"play dead" routine which uses hlt_play_dead() and avoids the
inadvertent "revivals" of "dead" CPUs this way.

A slightly unpleasant consequence of this change is that if the
system is hibernated with one or more CPUs offline, it will generally
draw more power after resume than it did before hibernation, because
the physical state entered by CPUs via hlt_play_dead() is higher-power
than the mwait_play_dead() one in the majority of cases.  It is
possible to work around this, but it is unclear how much of a problem
that's going to be in practice, so the workaround will be implemented
later if it turns out to be necessary.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
Reported-by: Varun Koyyalagunta <cpudebug@centtech.com>
Original-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 22:42:48 +02:00
Sebastian Andrzej Siewior
6b2c28471d x86/x2apic: Convert to CPU hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.736898691@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:41:42 +02:00
Richard Cochran
ae6a8a2ed7 x86/tboot: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Gang Wei <gang.wei@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ning Sun <ning.sun@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard L Maliszewski <richard.l.maliszewski@intel.com>
Cc: Shane Wang <shane.wang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Cc: tboot-devel@lists.sourceforge.net
Link: http://lkml.kernel.org/r/20160713153337.400227322@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:40:30 +02:00
Sebastian Andrzej Siewior
148b9e2abe x86/apb_timer: Convert to hotplug state machine
Install the callbacks via the state machine. There is no setup just one
teardown callback. Remove the silly comment about the workqueue up dependency.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.625342983@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:40:22 +02:00
Sebastian Andrzej Siewior
251a5fd64b x86/kvm/kvmclock: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

We assumed that the priority ordering was ment to invoke the online
callback as the last step. In the original code this also invoked the
down prepare callback as the last step. With the symmetric state
machine the down prepare callback is now the first step.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.542880859@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:40:21 +02:00
Anna-Maria Gleixner
162e52a117 KVM/x86: Remove superfluous SMP function call
Since the following commit:

  1cf4f629d9 ("cpu/hotplug: Move online calls to hotplugged cpu")

... the CPU_ONLINE and CPU_DOWN_PREPARE notifiers are always run on the hot
plugged CPU, and as of commit:

  3b9d6da67e ("cpu/hotplug: Fix rollback during error-out in __cpu_disable()")

the CPU_DOWN_FAILED notifier also runs on the hot plugged CPU.  This patch
converts the SMP functional calls into direct calls.

smp_function_call_single() executes the function with interrupts
disabled. This calling convention is not preserved because there
is no reason to do so.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.452527104@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:40:21 +02:00
Wei Jiangang
102bb9fef6 x86/apic: Remove the unused struct apic::apic_id_mask field
The only user verify_local_APIC() had been removed by commit:

  4399c03c67 ("x86/apic: Remove verify_local_APIC()")

... so there is no need to keep it.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: bsd@redhat.com
Cc: david.vrabel@citrix.com
Cc: jgross@suse.com
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1468463046-20849-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:39:05 +02:00
Ingo Molnar
07ccdcd34a Merge branch 'linus' into x86/apic, to refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:38:54 +02:00
Wei Jiangang
c48ec42d6e x86/tsc: Remove the unused check_tsc_disabled()
check_tsc_disabled() was introduced by commit:

  c73deb6aec ("perf/x86: Add ability to calculate TSC from perf sample timestamps")

The only caller was arch_perf_update_userpage(), which had been refactored
by commit:

  d8b11a0cbd ("perf/x86: Clean up cap_user_time* setting")

... so no need keep and export it any more.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: adrian.hunter@intel.com
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1468570330-25810-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:35:08 +02:00
H.J. Lu
3ebfd81f7f x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
Don't use the same syscall numbers for 2 different syscalls:

 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
 534	x32	preadv2			compat_sys_preadv2
 535	x32	pwritev2		compat_sys_pwritev2

Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset
is passed in one 64-bit register on x32, similar to compat_sys_preadv64()
and compat_sys_pwritev64().

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:30:26 +02:00
Andy Lutomirski
eb43e8f85f x86/smp: Remove unnecessary initialization of thread_info::cpu
It's statically initialized to zero -- no need to dynamically
initialize it to zero as well.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6cf6314dce3051371a913ee19d1b88e29c68c560.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:31 +02:00
Andy Lutomirski
fb59831b49 x86/smp: Remove stack_smp_processor_id()
It serves no purpose -- raw_smp_processor_id() works fine.  This
change will be needed to move thread_info off the stack.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a2bf4f07fbc30fb32f9f7f3f8f94ad3580823847.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:30 +02:00
Andy Lutomirski
13d4ea097d x86/uaccess: Move thread_info::addr_limit to thread_struct
struct thread_info is a legacy mess.  To prepare for its partial removal,
move thread_info::addr_limit out.

As an added benefit, this way is simpler.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:30 +02:00
Ingo Molnar
2a53ccbc0d x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
Rename it to match the thread_struct::uaccess_err pattern and also
because it was too long.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:29 +02:00
Andy Lutomirski
dfa9a942fd x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
struct thread_info is a legacy mess.  To prepare for its partial removal,
move the uaccess control fields out -- they're straightforward.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d0ac4d01c8e4d4d756264604e47445d5acc7900e.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:28 +02:00
Andy Lutomirski
2deb4be280 x86/dumpstack: When OOPSing, rewind the stack before do_exit()
If we call do_exit() with a clean stack, we greatly reduce the risk of
recursive oopses due to stack overflow in do_exit, and we allow
do_exit to work even if we OOPS from an IST stack.  The latter gives
us a much better chance of surviving long enough after we detect a
stack overflow to write out our logs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/32f73ceb372ec61889598da5e5b145889b9f2e19.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:28 +02:00
Andy Lutomirski
46aea38734 x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
If we get a vmalloc fault while current->active_mm->pgd doesn't
match CR3, we'll crash without this change.  I've seen this failure
mode on heavily instrumented kernels with virtually mapped stacks.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4650d7674185f165ed8fdf9ac4c5c35c5c179ba8.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:27 +02:00
Andy Lutomirski
98f30b1207 x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
If we overflow the stack into a guard page, we'll recursively fault
when trying to dump the contents of the guard page.  Use
probe_kernel_address() so we can recover if this happens.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e626d47a55d7b04dcb1b4d33faa95e8505b217c8.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:27 +02:00
Andy Lutomirski
9a2e9da3e0 x86/dumpstack: Try harder to get a call trace on stack overflow
If we overflow the stack, print_context_stack() will abort.  Detect
this case and rewind back into the valid part of the stack so that
we can trace it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ee1690eb2715ccc5dc187fde94effa4ca0ccbbcd.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:26 +02:00
Andy Lutomirski
d92fc69cca x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
kernel_unmap_pages_in_pgd() is dangerous: if a PGD entry in
init_mm.pgd were to be cleared, callers would need to ensure that
the pgd entry hadn't been propagated to any other pgd.

Its only caller was efi_cleanup_page_tables(), and that, in turn,
was unused, so just delete both functions.  This leaves a couple of
other helpers unused, so delete them, too.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/77ff20fdde3b75cd393be5559ad8218870520248.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:26 +02:00
Andy Lutomirski
360cb4d155 x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
This avoids pointless races in which another CPU or task might see a
partially populated global PGD entry.  These races should normally
be harmless, but, if another CPU propagates the entry via
vmalloc_fault() and then populate_pgd() fails (due to memory allocation
failure, for example), this prevents a use-after-free of the PGD
entry.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/bf99df27eac6835f687005364bd1fbd89130946c.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:25 +02:00
Ingo Molnar
af2cf278ef x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
So when memory hotplug removes a piece of physical memory from pagetable
mappings, it also frees the underlying PGD entry.

This complicates PGD management, so don't do this. We can keep the
PGD mapped and the PUD table all clear - it's only a single 4K page
per 512 GB of memory hotplugged.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/064ff6c7275734537f969e876f6cd0baa954d2cc.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:24 +02:00
Ingo Molnar
38452af242 Merge branch 'x86/asm' into x86/mm, to resolve conflicts
Conflicts:
	tools/testing/selftests/x86/Makefile

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:04 +02:00
Dmitry Vyukov
2ba78056ac kasan: add newline to messages
Currently GPF messages with KASAN look as follows:

  kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN

Add newlines.

Link: http://lkml.kernel.org/r/1467294357-98002-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15 14:54:27 +09:00
Alex Hung
4d581259b7 x86/reboot: Add Dell Optiplex 7450 AIO reboot quirk
Dell Optiplex 7450 AIO works with BOOT_ACPI; however, the quirk for
"OptiPlex 745" changes its boot method to BOOT_BIOS and causes 7450 AIO
hangs when rebooting; as a result, 7450 AIO is appended to overwrite
BOOT_BIOS by BOOT_ACPI in order not to break the original 745 series

Signed-off-by: Alex Hung <alex.hung@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 20:57:22 +02:00
Paolo Bonzini
2b23c3a6e3 KVM: SVM: do not set MSR_TSC_AUX on 32-bit builds
This is unnecessary---and besides, __getcpu() is not even
available on 32-bit builds.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 19:11:29 +02:00
Jim Mattson
2f1fe81123 KVM: nVMX: Fix memory corruption when using VMCS shadowing
When freeing the nested resources of a vcpu, there is an assumption that
the vcpu's vmcs01 is the current VMCS on the CPU that executes
nested_release_vmcs12(). If this assumption is violated, the vcpu's
vmcs01 may be made active on multiple CPUs at the same time, in
violation of Intel's specification. Moreover, since the vcpu's vmcs01 is
not VMCLEARed on every CPU on which it is active, it can linger in a
CPU's VMCS cache after it has been freed and potentially
repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity
miss can result in memory corruption.

It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If
the vcpu in question was last loaded on a different CPU, it must be
migrated to the current CPU before calling vmx_load_vmcs01().

Signed-off-by: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 19:11:20 +02:00
Peter Feiner
4e59516a12 kvm: vmx: ensure VMCS is current while enabling PML
Between loading the new VMCS and enabling PML, the CPU was unpinned.
If the vCPU thread were migrated to another CPU in the interim (e.g.,
due to preemption or sleeping alloc_page), then the VMWRITEs to enable
PML would target the wrong VMCS -- or no VMCS at all:

  [ 2087.266950] vmwrite error: reg 200e value 3fe1d52000 (err -506126336)
  [ 2087.267062] vmwrite error: reg 812 value 1ff (err 511)
  [ 2087.267125] vmwrite error: reg 401e value 12229c00 (err 304258048)

This patch ensures that the VMCS remains current while enabling PML by
doing the VMWRITEs while the CPU is pinned. Allocation of the PML buffer
is hoisted out of the critical section.

Signed-off-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 19:11:19 +02:00
Paolo Bonzini
9770404a00 KVM: SVM: fix trashing of MSR_TSC_AUX
I don't know what I was thinking when I wrote commit 46896c73c1 ("KVM:
svm: add support for RDTSCP", 2015-11-12); I missed write_rdtscp_aux which
obviously uses MSR_TSC_AUX.

Therefore we do need to save/restore MSR_TSC_AUX in svm_vcpu_run.

Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Fixes: 46896c73c1 ("KVM: svm: add support for RDTSCP")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 19:11:10 +02:00
Paul Gortmaker
eb008eb6f8 x86: Audit and remove any remaining unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  In the case of
some of these which are modular, we can extend that to also include
files that are building basic support functionality but not related
to loading or registering the final module; such files also have
no need whatsoever for module.h

The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each instance for the
presence of either and replace as needed.

In the case of crypto/glue_helper.c we delete a redundant instance
of MODULE_LICENSE in order to delete module.h -- the license info
is already present at the top of the file.

The uncore change warrants a mention too; it is uncore.c that uses
module.h and not uncore.h; hence the relocation done there.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-9-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:07:00 +02:00
Paul Gortmaker
1767e931e3 x86/kvm: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  In the case of
kvm where it is modular, we can extend that to also include files
that are building basic support functionality but not related
to loading or registering the final module; such files also have
no need whatsoever for module.h

The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each instance for the
presence of either and replace as needed.

Several instances got replaced with moduleparam.h since that was
really all that was required for those particular files.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20160714001901.31603-8-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:07:00 +02:00
Paul Gortmaker
7a2463dcac x86/xen: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20160714001901.31603-7-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:06:59 +02:00
Paul Gortmaker
cc3ae7b0af x86/platform: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.

One module.h was converted to moduleparam.h since the file had
multiple module_param() in it, and another file had an instance of
MODULE_DEVICE_TABLE deleted, since that is a no-op when builtin.

Finally, the 32 bit build coverage of olpc_ofw revealed a couple
implicit includes, which were pretty self evident to fix based on
what gcc was complaining about.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-6-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:06:59 +02:00
Paul Gortmaker
e683014c21 x86/lib: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.  Build testing
revealed a couple implicit header usage issues that were fixed.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-5-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:06:58 +02:00
Paul Gortmaker
186f43608a x86/kernel: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.  Build testing
revealed some implicit header usage that was fixed up accordingly.

Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:06:41 +02:00
Paul Gortmaker
4b599fedb7 x86/mm: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace accordingly where needed.

Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-3-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 13:04:20 +02:00
Paul Gortmaker
84e629b668 x86: Don't use module.h just for AUTHOR / LICENSE tags
The Kconfig controlling compilation of these files are:

 arch/x86/Kconfig.debug:config DEBUG_RODATA_TEST
 arch/x86/Kconfig.debug: bool "Testcase for the marking rodata read-only"

 arch/x86/Kconfig.debug:config X86_PTDUMP_CORE
 arch/x86/Kconfig.debug: def_bool n

...meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.

We delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-2-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 13:04:20 +02:00
Ingo Molnar
8b3843996d Merge branch 'x86/platform' into x86/headers, to apply dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 13:01:15 +02:00
Sebastian Andrzej Siewior
48d7f6c715 x86/hpet: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.279718463@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:44 +02:00
Anna-Maria Gleixner
c6a84daa34 perf/x86/amd/power: Convert the hotplug notifier to state machine
Install the callbacks via the state machine.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.027571056@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:42 +02:00
Anna-Maria Gleixner
8381f6a0c0 perf/x86/amd/power: Change hotplug notifier to a symmetric structure
To simplify the hotplug mechanism move the starting callback to
online. There is no functional requirement that the cpumask bit has to
be set in the starting callback.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153334.944849172@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:41 +02:00
Sebastian Andrzej Siewior
77c34ef1c3 perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153334.184061086@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:35 +02:00
Richard Cochran
f070482704 perf/x86/intel/cqm: Convert Intel CQM to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153334.096956222@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:35 +02:00
Richard Cochran
8b5b773d62 perf/x86/intel/rapl: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153334.008808086@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:34 +02:00
Thomas Gleixner
9744f7b7b3 perf/x86/amd/ibs: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153333.921401190@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:34 +02:00
Richard Cochran
96b2bd3866 perf/x86/amd/uncore: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153333.839150380@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:33 +02:00
Thomas Gleixner
1a246b9f58 perf/x86/intel/uncore: Convert to hotplug state machine
Convert the notifiers to state machine states and let the core code do the
setup for the already online CPUs. This notifier has a completely undocumented
ordering requirement versus perf hardcoded in the notifier priority. This
odering is only required for CPU down, so that hardware migration happens
before the core is notified about the outgoing CPU.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153333.752695801@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:32 +02:00
Thomas Gleixner
95ca792c75 perf/x86: Convert the core to the hotplug state machine
Replace the perf_notifier() install mechanism, which invokes magically
the callback on the current CPU. Convert the hardware specific
callbacks which are invoked from the x86 perf core to return proper
error codes instead of totally pointless NOTIFY_BAD return values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153333.670720553@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:32 +02:00
Sebastian Andrzej Siewior
07d36c9e84 x86/vdso: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153332.987560239@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:27 +02:00
Radim Krčmář
af1bae5497 KVM: x86: bump KVM_MAX_VCPU_ID to 1023
kzalloc was replaced with kvm_kvzalloc to allow non-contiguous areas and
rcu had to be modified to cope with it.

The practical limit for KVM_MAX_VCPU_ID right now is INT_MAX, but lower
value was chosen in case there were bugs.  1023 is sufficient maximum
APIC ID for 288 VCPUs.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:29:35 +02:00
Radim Krčmář
682f732ecf KVM: x86: bump MAX_VCPUS to 288
288 is in high demand because of Knights Landing CPU.
We cannot set the limit to 640k, because that would be wasting space.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:29:34 +02:00
Radim Krčmář
c519265f2a KVM: x86: add a flag to disable KVM x2apic broadcast quirk
Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
KVM_CAP_X2APIC_API.

The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
The enableable capability is needed in order to support standard x2APIC and
remain backward compatible.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Expand kvm_apic_mda comment. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:29:34 +02:00
Radim Krčmář
3713131345 KVM: x86: add KVM_CAP_X2APIC_API
KVM_CAP_X2APIC_API is a capability for features related to x2APIC
enablement.  KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
Both are needed to support x2APIC.

The feature has to be enableable and disabled by default, because
get/set ioctl shifted and truncated APIC ID to 8 bits by using a
non-standard protocol inspired by xAPIC and the change is not
backward-compatible.

Changes to MSI addresses follow the format used by interrupt remapping
unit.  The upper address word, that used to be 0, contains upper 24 bits
of the LAPIC address in its upper 24 bits.  Lower 8 bits are reserved as
0.  Using the upper address word is not backward-compatible either as we
didn't check that userspace zeroed the word.  Reserved bits are still
not explicitly checked, but non-zero data will affect LAPIC addresses,
which will cause a bug.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:57 +02:00
Radim Krčmář
c63cf538eb KVM: pass struct kvm to kvm_set_routing_entry
Arch-specific code will use it.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:56 +02:00
Radim Krčmář
4d8e772bf8 KVM: x86: reset lapic base in kvm_lapic_reset
LAPIC is reset in xAPIC mode and the surrounding code expects that.
KVM never resets after initialization.  This patch is just for sanity.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:56 +02:00
Radim Krčmář
c93de59dcd KVM: VMX: optimize APIC ID read with APICv
The register is in hardware-compatible format now, so there is not need
to intercept.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:55 +02:00
Radim Krčmář
49bd29ba1d KVM: x86: reset APIC ID when enabling LAPIC
APIC ID should be set to the initial APIC ID when enabling LAPIC.
This only matters if the guest changes APIC ID.  No sane OS does that.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:54 +02:00
Radim Krčmář
a92e2543d6 KVM: x86: use hardware-compatible format for APIC ID register
We currently always shift APIC ID as if APIC was in xAPIC mode.
x2APIC mode wants to use more bits and storing a hardware-compabible
value is the the sanest option.

KVM API to set the lapic expects that bottom 8 bits of APIC ID are in
top 8 bits of APIC_ID register, so the register needs to be shifted in
x2APIC mode.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:54 +02:00
Radim Krčmář
3159d36ad7 KVM: x86: use generic function for MSI parsing
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:53 +02:00
Radim Krčmář
0ca52e7b81 KVM: x86: dynamic kvm_apic_map
x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
probably has fewer VCPUs.  Dynamic size saves memory at the cost of
turning one constant into a variable.

apic_map mutex had to be moved before allocation to avoid races with cpu
hotplug.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:53 +02:00
Radim Krčmář
e45115b62f KVM: x86: use physical LAPIC array for logical x2APIC
Logical x2APIC IDs map injectively to physical x2APIC IDs, so we can
reuse the physical array for them.  This allows us to save space by
sizing the logical maps according to the needs of xAPIC.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:52 +02:00
Radim Krčmář
64aa47bfc4 KVM: x86: add kvm_apic_map_get_dest_lapic
kvm_irq_delivery_to_apic_fast and kvm_intr_is_single_vcpu_fast both
compute the interrupt destination.  Factor the code.

'struct kvm_lapic **dst = NULL' had to be added to silence GCC.
GCC might complain about potential NULL access in the future, because it
missed conditions that avoided uninitialized uses of dst.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:51 +02:00
Radim Krčmář
757883de41 KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
240 has been well tested by Red Hat.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:51 +02:00
Bandan Das
02120c45b0 kvm: vmx: advertise support for ept execute only
MMU now knows about execute only mappings, so
advertise the feature to L1 hypervisors

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:50 +02:00
Bandan Das
d95c55687e kvm: mmu: track read permission explicitly for shadow EPT page tables
To support execute only mappings on behalf of L1 hypervisors,
reuse ACC_USER_MASK to signify if the L1 hypervisor has the R bit
set.

For the nested EPT case, we assumed that the U bit was always set
since there was no equivalent in EPT page tables.  Strictly
speaking, this was not necessary because handle_ept_violation
never set PFERR_USER_MASK in the error code (uf=0 in the
parlance of update_permission_bitmask).  We now have to set
both U and UF correctly, respectively in FNAME(gpte_access)
and in handle_ept_violation.

Also in handle_ept_violation bit 3 of the exit qualification is
not enough to detect a present PTE; all three bits 3-5 have to
be checked.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:50 +02:00
Bandan Das
ffb128c89b kvm: mmu: don't set the present bit unconditionally
To support execute only mappings on behalf of L1
hypervisors, we need to teach set_spte() to honor all three of
L1's XWR bits.  As a start, add a new variable "shadow_present_mask"
that will be set for non-EPT shadow paging and clear for EPT.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:14 +02:00
Bandan Das
812f30b234 kvm: mmu: remove is_present_gpte()
We have two versions of the above function.
To prevent confusion and bugs in the future, remove
the non-FNAME version entirely and replace all calls
with the actual check.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:02:47 +02:00
Bandan Das
8d5cf1610d kvm: mmu: extend the is_present check to 32 bits
This is safe because this function is called
on host controlled page table and non-present/non-MMIO
sptes never use bits 1..31. For the EPT case, this
ensures that cases where only the execute bit is set
is marked valid.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:02:46 +02:00
Linus Torvalds
f97d10454e Merge branches 'perf-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf and timer fixes from Ingo Molnar:
 "A fix for a posix CPU timers bug, and a perf printk message fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix bogus kernel printk, again

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix_cpu_timer: Exit early when process has been reaped
2016-07-14 05:44:47 +09:00
Thomas Gleixner
54f5449677 Merge branch 'timers/core' into smp/hotplug to pick up dependencies 2016-07-13 17:01:51 +02:00
Dave Hansen
dcb32d9913 x86/mm: Use pte_none() to test for empty PTE
The page table manipulation code seems to have grown a couple of
sites that are looking for empty PTEs.  Just in case one of these
entries got a stray bit set, use pte_none() instead of checking
for a zero pte_val().

The use pte_same() makes me a bit nervous.  If we were doing a
pte_same() check against two cleared entries and one of them had
a stray bit set, it might fail the pte_same() check.  But, I
don't think we ever _do_ pte_same() for cleared entries.  It is
almost entirely used for checking for races in fault-in paths.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001915.813703D9@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:43:25 +02:00
Dave Hansen
e4a84be6f0 x86/mm: Disallow running with 32-bit PTEs to work around erratum
The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
Landing) has an erratum where a processor thread setting the Accessed
or Dirty bits may not do so atomically against its checks for the
Present bit.  This may cause a thread (which is about to page fault)
to set A and/or D, even though the Present bit had already been
atomically cleared.

These bits are truly "stray".  In the case of the Dirty bit, the
thread associated with the stray set was *not* allowed to write to
the page.  This means that we do not have to launder the bit(s); we
can simply ignore them.

If the PTE is used for storing a swap index or a NUMA migration index,
the A bit could be misinterpreted as part of the swap type.  The stray
bits being set cause a software-cleared PTE to be interpreted as a
swap entry.  In some cases (like when the swap index ends up being
for a non-existent swapfile), the kernel detects the stray value
and WARN()s about it, but there is no guarantee that the kernel can
always detect it.

When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
to move the swap PTE format around to avoid these troublesome bits.
But, 32-bit non-PAE is tight on bits.  So, disallow it from running
on this hardware.  I can't imagine anyone wanting to run 32-bit
non-highmem kernels on this hardware, but disallowing them from
running entirely is surely the safe thing to do.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:43:25 +02:00
Dave Hansen
97e3c602cc x86/mm: Ignore A/D bits in pte/pmd/pud_none()
The erratum we are fixing here can lead to stray setting of the
A and D bits.  That means that a pte that we cleared might
suddenly have A/D set.  So, stop considering those bits when
determining if a pte is pte_none().  The same goes for the
other pmd_none() and pud_none().  pgd_none() can be skipped
because it is not affected; we do not use PGD entries for
anything other than pagetables on affected configurations.

This adds a tiny amount of overhead to all pte_none() checks.
I doubt we'll be able to measure it anywhere.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001912.5216F89C@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:43:25 +02:00
Dave Hansen
00839ee3b2 x86/mm: Move swap offset/type up in PTE to work around erratum
This erratum can result in Accessed/Dirty getting set by the hardware
when we do not expect them to be (on !Present PTEs).

Instead of trying to fix them up after this happens, we just
allow the bits to get set and try to ignore them.  We do this by
shifting the layout of the bits we use for swap offset/type in
our 64-bit PTEs.

It looks like this:

 bitnrs: |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0|
 names:  |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
 before: |         OFFSET (9-63)          |0|X|X| TYPE(1-5) |0|
  after: | OFFSET (14-63)  |  TYPE (9-13) |0|X|X|X| X| X|X|X|0|

Note that D was already a don't care (X) even before.  We just
move TYPE up and turn its old spot (which could be hit by the
A bit) into all don't cares.

We take 5 bits away from the offset, but that still leaves us
with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
I think that's probably fine for the moment.  We could
theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
doesn't gain us anything.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:43:25 +02:00
Andy Shevchenko
05f310e26f x86/sfi: Enable enumeration of SD devices
SFI specification v0.8.2 defines type of devices which are connected to
SD bus. In particularly WiFi dongle is a such.

Add a callback to enumerate the devices connected to SD bus.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468322192-62080-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:24:51 +02:00
Andy Shevchenko
707a605b5a x86/pci: Use MRFLD abbreviation for Merrifield
Everywhere in the kernel the MRFLD is used as abbreviation of Intel Merrifield.
Do the same in intel_mid_pci.c module.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468321462-136016-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:24:51 +02:00
Dan Williams
7a9eb20666 pmem: kill __pmem address space
The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem().  Now that
wmb_pmem() is gone, there is little need to keep this annotation.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 19:25:38 -07:00
Dan Williams
7c8a6a7190 pmem: kill wmb_pmem()
All users have been replaced with flushing in the pmem driver.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 15:13:48 -07:00
Tim Chen
eb9bc8e7af crypto: sha-mb - Cleanup code to use || instead of |
for condition comparison and cleanup multiline comment style

In sha*_ctx_mgr_submit, we currently use the | operator instead of ||
((ctx->partial_block_buffer_length) | (len < SHA1_BLOCK_SIZE))

Switching it to || and remove extraneous paranthesis to
adhere to coding style.

Also cleanup inconsistent multiline comment style.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-12 13:02:00 +08:00
Len Brown
ff4c86635e x86/tsc: Enumerate BXT tsc_khz via CPUID
Hard code the BXT crystal clock (aka ART - Always Running Timer)
to 19.200 MHz, and use CPUID leaf 0x15 to determine the BXT TSC frequency.

Use tsc_khz to sanity check BXT cpu_khz,
which can be erroneous in some configurations.

(I simplified the original patch from Bin Gao.)

Original-From: Bin Gao <bin.gao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/bf4e7c175acd6d09719c47c319b10ff1f0627ff8.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 21:30:13 +02:00
Len Brown
aa297292d7 x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
Skylake CPU base-frequency and TSC frequency may differ
by up to 2%.

Enumerate CPU and TSC frequencies separately, allowing
cpu_khz and tsc_khz to differ.

The existing CPU frequency calibration mechanism is unchanged.
However, CPUID extensions are preferred, when available.

CPUID.0x16 is preferred over MSR and timer calibration
for CPU frequency discovery.

CPUID.0x15 takes precedence over CPU-frequency
for TSC frequency discovery.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b27ec289fd005833b27d694d9c2dbb716c5cdff7.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 21:30:13 +02:00
Len Brown
02c0cd2dcf x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
Remove the irqoff/irqon around MSR-based TSC enumeration,
as it is not necessary.

Also rename: try_msr_calibrate_tsc() to cpu_khz_from_msr(),
as that better describes what the routine does.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a6b5c3ecd3b068175d2309599ab28163fc34215e.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 21:30:12 +02:00
Yu-cheng Yu
b8be15d588 x86/fpu/xstate: Re-enable XSAVES
We did not handle XSAVES instructions correctly. There were issues in
converting between standard and compacted format when interfacing with
user-space. These issues have been corrected.

Add a WARN_ONCE() to make it clear that XSAVES supervisor states are not
yet implemented.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <h.peter.anvin@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468253937-40008-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 16:44:01 +02:00
Yu-cheng Yu
35ac2d7ba7 x86/fpu/xstate: Fix fpstate_init() for XRSTORS
In XSAVES mode if fpstate_init() is used to initialize a
task's extended state area, xsave.header.xcomp_bv[63] must
be set. Otherwise, when the task is scheduled, a warning is
triggered from copy_kernel_to_xregs().

One such test case is: setting an invalid extended state
through PTRACE. When xstateregs_set() rejects the syscall
and re-initializes the task's extended state area. This triggers
the warning mentioned above.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <h.peter.anvin@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468253937-40008-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 16:44:00 +02:00
Yu-cheng Yu
5060b91513 x86/fpu/xstate: Return NULL for disabled xstate component address
It is an error to request a disabled XSAVE/XSAVES component address.
For that case, make __raw_xsave_addr() return a NULL and issue a
warning.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <h.peter.anvin@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468253937-40008-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 16:44:00 +02:00
Yu-cheng Yu
1fc2b67b43 x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES
When the kernel is using XSAVES compacted format, we cannot do
__copy_from_user() from a signal frame, which has standard-format data.
Fix it by using copyin_to_xsaves(), which converts between formats and
filters out all supervisor states that we do not allow userspace to
write.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <h.peter.anvin@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468253937-40008-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 16:43:59 +02:00
Paolo Bonzini
8391ce447f KVM: VMX: introduce vm_{entry,exit}_control_reset_shadow
There is no reason to read the entry/exit control fields of the
VMCS and immediately write back the same value.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-11 10:07:49 +02:00
Paolo Bonzini
9314006db8 KVM: nVMX: keep preemption timer enabled during L2 execution
Because the vmcs12 preemption timer is emulated through a separate hrtimer,
we can keep on using the preemption timer in the vmcs02 to emulare L1's
TSC deadline timer.

However, the corresponding bit in the pin-based execution control field
must be kept consistent between vmcs01 and vmcs02.  On vmentry we copy
it into the vmcs02; on vmexit the preemption timer must be disabled in
the vmcs01 if a preemption timer vmexit happened while in guest mode.

The preemption timer value in the vmcs02 is set by vmx_vcpu_run, so it
need not be considered in prepare_vmcs02.

Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Tested-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-11 09:49:29 +02:00
Wanpeng Li
55123e3c86 KVM: nVMX: avoid incorrect preemption timer vmexit in nested guest
The preemption timer for nested VMX is emulated by hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
nested_vmx_exit_handled is always returning true for preemption timer vmexit.  Then,
the L1 preemption timer vmexit is captured and be treated as a L2 preemption
timer vmexit, causing NULL pointer dereferences or worse in the L1 guest's
vmexit handler:

    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<          (null)>]           (null)
    PGD 0
    Oops: 0010 [#1] SMP
    Call Trace:
     ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
     handle_preemption_timer+0xe/0x20 [kvm_intel]
     vmx_handle_exit+0x169/0x15a0 [kvm_intel]
     ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
     kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
     ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
     ? vcpu_load+0x1c/0x60 [kvm]
     ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
     kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
     do_vfs_ioctl+0x96/0x6a0
     ? __fget_light+0x2a/0x90
     SyS_ioctl+0x79/0x90
     do_syscall_64+0x68/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    Code:  Bad RIP value.
    RIP  [<          (null)>]           (null)
     RSP <ffff8800b5263c48>
    CR2: 0000000000000000
    ---[ end trace 9c70c48b1a2bc66e ]---

This can be reproduced readily by preemption timer enabled on L0 and disabled
on L1.

Return false since preemption timer vmexits must never be reflected to L2.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-11 09:49:26 +02:00
Paolo Bonzini
1c17c3e6bf KVM: VMX: reflect broken preemption timer in vmcs_config
Simplify cpu_has_vmx_preemption_timer.  This is consistent with the
rest of setup_vmcs_config and preparatory for the next patch.

Tested-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-11 09:48:49 +02:00
Ingo Molnar
44530d588e Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86"
This reverts commit 2c95afc1e8.

Stephane reported the following regression:

 > Since Andi added:
 >
 > commit 2c95afc1e8
 > Author: Andi Kleen <ak@linux.intel.com>
 > Date:   Thu Jun 9 06:14:38 2016 -0700
 >
 >    perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
 >
 > $ perf stat -e ref-cycles ls
 >   <not counted> ....
 >
 > fails systematically because the ref-cycles is now used by the
 > watchdog and given this is a system-wide pinned event, it monopolizes
 > the fixed counter 2 which is the only counter able to measure this event.

Since the next merge window is near, fix the regression for now
by reverting the commit.

Reported-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 20:58:36 +02:00
Lukas Wunner
abb2bafd29 x86/quirks: Add early quirk to reset Apple AirPort card
The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.

The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.

The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.

When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56

This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.

Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.

Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.

The following should be a comprehensive list of affected models:
    iMac13,1        2012  21.5"       [Root Port 00:1c.3 = 8086:1e16]
    iMac13,2        2012  27"         [Root Port 00:1c.3 = 8086:1e16]
    Macmini5,1      2011  i5 2.3 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini5,2      2011  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini5,3      2011  i7 2.0 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini6,1      2012  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1e12]
    Macmini6,2      2012  i7 2.3 GHz  [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro8,1   2011  13"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro8,2   2011  15"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro8,3   2011  17"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro9,1   2012  15"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro9,2   2012  13"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro10,1  2012  15"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro10,2  2012  13"         [Root Port 00:1c.1 = 8086:1e12]

For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):

    irq 17: nobody cared (try booting with the "irqpoll" option)
    handlers:
    [<ffffffff81374370>] pcie_isr
    [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
    [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
    Disabling IRQ #17

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru>        # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de>                # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com>       # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com>          # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Apply nvidia_bugs quirk only on root bus
Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Reintroduce scanning of secondary buses
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 20:13:53 +02:00
Lukas Wunner
850c321027 x86/quirks: Reintroduce scanning of secondary buses
We used to scan secondary buses until the following commit that
was applied in 2009:

  8659c406ad ("x86: only scan the root bus in early PCI quirks")

which commit constrained early quirks to the root bus only. Its
motivation was to prevent application of the nvidia_bugs quirk
on secondary buses.

We're about to add a quirk to reset the Broadcom 4331 wireless card on
2011/2012 Macs, which is located on a secondary bus behind a PCIe root
port. To facilitate that, reintroduce scanning of secondary buses.

The commit message of 8659c406ad notes that scanning only the root bus
"saves quite some unnecessary scanning work". The algorithm used prior
to 8659c406ad was particularly time consuming because it scanned
buses 0 to 31 brute force. To avoid lengthening boot time, employ a
recursive strategy which only scans buses that are actually reachable
from the root bus.

Yinghai Lu pointed out that the secondary bus number read from a
bridge's config space may be invalid, in particular a value of 0 would
cause an infinite loop. The PCI core goes beyond that and recurses to a
child bus only if its bus number is greater than the parent bus number
(see pci_scan_bridge()). Since the root bus is numbered 0, this implies
that secondary buses may not be 0. Do the same on early scanning.

If this algorithm is found to significantly impact boot time or cause
infinite loops on broken hardware, it would be possible to limit its
recursion depth: The Broadcom 4331 quirk applies at depth 1, all others
at depth 0, so the bus need not be scanned deeper than that for now. An
alternative approach would be to revert to scanning only the root bus,
and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12
and 8086:1e16. Apple always positioned the card behind either of these
three ports. The quirk would then check presence of the card in slot 0
below the root port and do its deed.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 20:13:53 +02:00
Lukas Wunner
447d29d1d3 x86/quirks: Apply nvidia_bugs quirk only on root bus
Since the following commit:

  8659c406ad ("x86: only scan the root bus in early PCI quirks")

... early quirks are only applied to devices on the root bus.

The motivation was to prevent application of the nvidia_bugs quirk on
secondary buses.

We're about to reintroduce scanning of secondary buses for a quirk to
reset the Broadcom 4331 wireless card on 2011/2012 Macs. To prevent
regressions, open code the requirement to apply nvidia_bugs only on the
root bus.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4d5477c1d76b2f0387a780f2142bbcdd9fee869b.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 20:13:53 +02:00
Vegard Nossum
eb01950356 perf/x86: Fix bogus kernel printk, again
This showed up as "6Failed to access..." here.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 1b74dde7c4 ("x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)")
Link: http://lkml.kernel.org/r/1468170841-17045-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 20:05:48 +02:00
Andy Shevchenko
06a3fcc44d x86/platform/intel-mid: Make vertical indentation consistent
The vertical indentation is kinda chaotic in intel-mid.h. Let's be
consistent with it.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1465992260-29897-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:22:31 +02:00
Thomas Garnier
4ff5308744 x86/mm: Do not reference phys addr beyond kernel
The new physical address randomized KASLR implementation can cause the
kernel to be aligned close to the end of physical memory. In this case,
_brk_end aligned to PMD will go beyond what is expected safe and hit
the assert in __phys_addr_symbol():

	VIRTUAL_BUG_ON(y >= KERNEL_IMAGE_SIZE);

Instead, perform an inclusive range check to avoid incorrectly triggering
the assert:

	kernel BUG at arch/x86/mm/physaddr.c:38!
	invalid opcode: 0000 [#1] SMP
	...
	RIP: 0010:[<ffffffffbe055721>] __phys_addr_symbol+0x41/0x50
	...
	Call Trace:
	[<ffffffffbe052eb9>] cpa_process_alias+0xa9/0x210
	[<ffffffffbe109011>] ? do_raw_spin_unlock+0xc1/0x100
	[<ffffffffbe051eef>] __change_page_attr_set_clr+0x8cf/0xbd0
	[<ffffffffbe201a4d>] ? vm_unmap_aliases+0x7d/0x210
	[<ffffffffbe05237c>] change_page_attr_set_clr+0x18c/0x4e0
	[<ffffffffbe0534ec>] set_memory_4k+0x2c/0x40
	[<ffffffffbefb08b3>] check_bugs+0x28/0x2a
	[<ffffffffbefa4f52>] start_kernel+0x49d/0x4b9
	[<ffffffffbefa4120>] ? early_idt_handler_array+0x120/0x120
	[<ffffffffbefa4423>] x86_64_start_reservations+0x29/0x2b
	[<ffffffffbefa4568>] x86_64_start_kernel+0x143/0x152

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Link: http://lkml.kernel.org/r/20160615190545.GA26071@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:21:37 +02:00
Yu-cheng Yu
ac73b27aea x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates
The arrays xstate_offsets[] and xstate_sizes[] record XSAVE standard-
format offsets and sizes. Values for non-extended state components
fpu and xmm's were not initialized or used. Ptrace format conversion
needs them. Fix it.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cf3ea36cf30e2a99e37da6483e65446d018ff0a7.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:12:11 +02:00
Yu-cheng Yu
996952e014 x86/fpu/xstate: Fix XSTATE component offset print out
Component offset print out was incorrect for XSAVES. Correct it and move
to a separate function.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/86602a8ac400626c6eca7125c3e15934866fc38e.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:12:10 +02:00
Yu-cheng Yu
91c3dba7db x86/fpu/xstate: Fix PTRACE frames for XSAVES
XSAVES uses compacted format and is a kernel instruction. The kernel
should use standard-format, non-supervisor state data for PTRACE.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
[ Edited away artificial linebreaks. ]
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/de3d80949001305fe389799973b675cab055c457.1466179491.git.yu-cheng.yu@intel.com
[ Made various readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:12:10 +02:00
Yu-cheng Yu
1499ce2dd4 x86/fpu/xstate: Fix supervisor xstate component offset
CPUID function 0x0d, sub function (i, i > 1) returns in ebx the offset of
xstate component i. Zero is returned for a supervisor state. A supervisor
state can only be saved by XSAVES and XSAVES uses a compacted format.
There is no fixed offset for a supervisor state. This patch checks and
makes sure a supervisor state offset is not recorded or mis-used. This has
no effect in practice as we currently use no supervisor states, but it
would be good to fix.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/81b29e40d35d4cec9f2511a856fe769f34935a3f.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:12:10 +02:00
Yu-cheng Yu
03482e08a8 x86/fpu/xstate: Align xstate components according to CPUID
CPUID function 0x0d, sub function (i, i > 1) returns in ecx[1] the
alignment requirement of component 'i' when the compacted format is used.

If ecx[1] is 0, component 'i' is located immediately following the preceding
component. If ecx[1] is 1, component 'i' is located on the next 64-byte
boundary following the preceding component.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/331e2bef1a0a7a584f06adde095b6bbfbe166472.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:12:10 +02:00
Ingo Molnar
08fb98f5bf Merge branch 'linus' into x86/fpu, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:11:17 +02:00
Len Brown
6fcb41cdae x86/tsc_msr: Add Airmont reference clock values
per the Intel 64 and IA-32 Architecture Software Developer's Manual...

Add the reference clock for Intel Atom Processors
Based on the Airmont Microarchitecture.

Reported-by: Stephane Gasparini <stephane.gasparini@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/abc6a0f4b18281410da1a3f26e2819d8e03e144f.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:00:13 +02:00
Len Brown
05680e7fa8 x86/tsc_msr: Correct Silvermont reference clock values
Atom processors use a 19.2 MHz crystal oscillator.

Early processors generate 100 MHz via 19.2 MHz * 26 / 5 = 99.84 MHz.

Later preocessor generate 100 MHz via 19.2 MHz * 125 / 24 = 100 MHz.

Update the Silvermont-based tables accordingly,
matching the Software Developers Manual.

Also, correct a 166 MHz entry that should have been 116 MHz,
and add a missing 80 MHz entry.

Reported-by: Stephane Gasparini <stephane.gasparini@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5d7561655dfb066ff10801b423405bae4d1cfbe2.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:00:13 +02:00
Len Brown
9e0cae9f62 x86/tsc_msr: Update comments, expand definitions
Syntax only, no functional change.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/8653a2dba21fef122fc7b29eafb750e2004d3976.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:00:13 +02:00
Len Brown
14bb4e3486 x86/tsc_msr: Remove debugging messages
Debugging messages are not necessary after all of the
possible hardware failures that never occur.
Instead, this code can simply return 0.

This code also doesn't need to print in the success case.
tsc_init() already prints the TSC frequency,
and apic=debug is available if anybody really is
interested in printing the LAPIC frequency.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/cf03279a125b95dfa9b8d3d5b4a66de09cd04050.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:00:13 +02:00
Len Brown
ba8268330d x86/tsc_msr: Identify Intel-specific code
try_msr_calibrate_tsc() is currently Intel-specific,
and should not execute on any other vendor's parts.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1fe23c052826bdcfeb3d45045aa02246078cb5a7.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:00:12 +02:00
Len Brown
fc5f3ac247 Revert "x86/tsc: Add missing Cherrytrail frequency to the table"
This reverts commit:

  e2724e9d96 ("x86/tsc: Add missing Cherrytrail frequency to the table")

... as it is incomplete, and is replaced by a more complete patch
later in this series.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/2199d0e959f7f71a18827268b5d060f8d3831639.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:00:12 +02:00
Paolo Bonzini
be8a18e2e9 x86/entry: Inline enter_from_user_mode()
This matches what is already done for prepare_exit_to_usermode(),
and saves about 60 clock cycles (4% speedup) with the benchmark
in the previous commit message.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1466434712-31440-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 13:33:02 +02:00
Paolo Bonzini
2e9d1e150a x86/entry: Avoid interrupt flag save and restore
Thanks to all the work that was done by Andy Lutomirski and others,
enter_from_user_mode() and prepare_exit_to_usermode() are now called only with
interrupts disabled.  Let's provide them a version of user_enter()/user_exit()
that skips saving and restoring the interrupt flag.

On an AMD-based machine I tested this patch on, with force-enabled
context tracking, the speed-up in system calls was 90 clock cycles or 6%,
measured with the following simple benchmark:

    #include <sys/signal.h>
    #include <time.h>
    #include <unistd.h>
    #include <stdio.h>

    unsigned long rdtsc()
    {
        unsigned long result;
        asm volatile("rdtsc; shl $32, %%rdx; mov %%eax, %%eax\n"
                     "or %%rdx, %%rax" : "=a" (result) : : "rdx");
        return result;
    }

    int main()
    {
        unsigned long tsc1, tsc2;
        int pid = getpid();
        int i;

        tsc1 = rdtsc();
        for (i = 0; i < 100000000; i++)
            kill(pid, SIGWINCH);
        tsc2 = rdtsc();

        printf("%ld\n", tsc2 - tsc1);
    }

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1466434712-31440-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 13:33:02 +02:00
Andy Shevchenko
a11836fa5a x86/platform/intel-mid: Mark regulators explicitly defined
Intel MID platforms are using explicitly defined regulators.

Let the regulator core know that we do not have any additional
regulators left. This lets it substitute unprovided regulators with
dummy ones.

Without this change when CONFIG_REGULATOR=y the USB driver fails on getting
"vbus" regulator and SDHCI can't get "vmmc" and "vqmmc" regulators either.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468071929-77383-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 10:33:41 +02:00
Dave Hansen
8709ed4d4b x86/cpu: Fix duplicated X86_BUG(9) macro
cpufeatures.h currently defines X86_BUG(9) twice on 32-bit:

	#define X86_BUG_NULL_SEG        X86_BUG(9) /* Nulling a selector preserves the base */
	...
	#ifdef CONFIG_X86_32
	#define X86_BUG_ESPFIX          X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */
	#endif

I think what happened was that this added the X86_BUG_ESPFIX, but
in an #ifdef below most of the bugs:

	58a5aac533 x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled

Then this came along and added X86_BUG_NULL_SEG, but collided
with the earlier one that did the bug below the main block
defining all the X86_BUG()s.

	7a5d670487 x86/cpu: Probe the behavior of nulling out a segment at boot time

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160618001503.CEE1B141@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-09 14:06:06 +02:00
Andy Shevchenko
62d855d3e7 x86/platform/intel-mid: Rename mrfl.c to mrfld.c
Use mrfld as an abbreviation of Merrifield to be consistent with the rest of
the code.

In the future we are going to add more files here prefixed with 'mrfld'.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1466265094-146113-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-09 14:02:09 +02:00
Ingo Molnar
52e31f89cc Merge branch 'linus' into x86/asm, to pick up fixes before merging new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-09 10:43:49 +02:00
Linus Torvalds
a017f583ec Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Three fixes:

   - A boot crash fix with certain configs
   - a MAINTAINERS entry update
   - Documentation typo fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Documentation: Fix various typos in Documentation/x86/ files
  x86/amd_nb: Fix boot crash on non-AMD systems
  MAINTAINERS: Update the Calgary IOMMU entry
2016-07-08 09:06:52 -07:00
Linus Torvalds
612807fe28 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Various fixes:

   - 32-bit callgraph bug fix
   - suboptimal event group scheduling bug fix
   - event constraint fixes for Broadwell/Skylake
   - RAPL module name collision fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix pmu::filter_match for SW-led groups
  x86/perf/intel/rapl: Fix module name collision with powercap intel-rapl
  perf/x86: Fix 32-bit perf user callgraph collection
  perf/x86/intel: Update event constraints when HT is off
2016-07-08 09:02:16 -07:00
Thomas Garnier
90397a4177 x86/mm: Add memory hotplug support for KASLR memory randomization
Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
the padding used for the physical memory mapping section when KASLR
memory is enabled. It ensures there is enough virtual address space when
CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
entropy available.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-10-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:35:21 +02:00
Thomas Garnier
a95ae27c2e x86/mm: Enable KASLR for vmalloc memory regions
Add vmalloc to the list of randomized memory regions.

The vmalloc memory region contains the allocation made through the vmalloc()
API. The allocations are done sequentially to prevent fragmentation and
each allocation address can easily be deduced especially from boot.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-8-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:35:21 +02:00
Thomas Garnier
021182e52f x86/mm: Enable KASLR for physical mapping memory regions
Add the physical mapping in the list of randomized memory regions.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowing the base address and physical memory size, an attacker
can deduce the PDE virtual address for the vDSO memory page. This attack
was demonstrated at CanSecWest 2016, in the following presentation:

  "Getting Physical: Extreme Abuse of Intel Based Paged Systems":
  https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf

(See second part of the presentation).

The exploits used against Linux worked successfully against 4.6+ but
fail with KASLR memory enabled:

  https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits

Similar research was done at Google leading to this patch proposal.

Variants exists to overwrite /proc or /sys objects ACLs leading to
elevation of privileges. These variants were tested against 4.6+.

The page offset used by the compressed kernel retains the static value
since it is not yet randomized during this boot stage.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:35:15 +02:00
Thomas Garnier
0483e1fa6e x86/mm: Implement ASLR for kernel memory regions
Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.

This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.

The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region.  An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each region.

Updated documentation on x86_64 memory layout accordingly.

Performance data, after all patches in the series:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):

attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:46 +02:00
Thomas Garnier
b234e8a090 x86/mm: Separate variable for trampoline PGD
Use a separate global variable to define the trampoline PGD used to
start other processors. This change will allow KALSR memory
randomization to change the trampoline PGD to be correctly aligned with
physical memory.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:46 +02:00
Thomas Garnier
faa379332f x86/mm: Add PUD VA support for physical mapping
Minor change that allows early boot physical mapping of PUD level virtual
addresses. The current implementation expects the virtual address to be
PUD aligned. For KASLR memory randomization, we need to be able to
randomize the offset used on the PUD table.

It has no impact on current usage.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:46 +02:00
Thomas Garnier
59b3d0206d x86/mm: Update physical mapping variable names
Change the variable names in kernel_physical_mapping_init() and related
functions to correctly reflect physical and virtual memory addresses.
Also add comments on each function to describe usage and alignment
constraints.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:46 +02:00
Thomas Garnier
d899a7d146 x86/mm: Refactor KASLR entropy functions
Move the KASLR entropy functions into arch/x86/lib to be used in early
kernel boot for KASLR memory randomization.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:45 +02:00
Ingo Molnar
9e7f7f5425 Merge branch 'x86/mm' into x86/boot, to pick up dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:27:47 +02:00
Borislav Petkov
9a7e7b5718 x86/asm/entry: Make thunk's restore a local label
No need to have it appear in objdump output.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160708141016.GH3808@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 16:47:01 +02:00
Baoquan He
6daa2ec0b3 x86/KASLR: Fix boot crash with certain memory configurations
Ye Xiaolong reported this boot crash:

|
|  XZ-compressed data is corrupt
|
|   -- System halted
|

Fix the bug in mem_avoid_overlap() of finding the earliest overlap.

Reported-and-tested-by: Ye Xiaolong <xiaolong.ye@intel.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 14:36:19 +02:00
Dmitry Safonov
b059a453b1 x86/vdso: Add mremap hook to vm_special_mapping
Add possibility for 32-bit user-space applications to move
the vDSO mapping.

Previously, when a user-space app called mremap() for the vDSO
address, in the syscall return path it would land on the previous
address of the vDSOpage, resulting in segmentation violation.

Now it lands fine and returns to userspace with a remapped vDSO.

This will also fix the context.vdso pointer for 64-bit, which does
not affect the user of vDSO after mremap() currently, but this
may change in the future.

As suggested by Andy, return -EINVAL for mremap() that would
split the vDSO image: that operation cannot possibly result in
a working system so reject it.

Renamed and moved the text_mapping structure declaration inside
map_vdso(), as it used only there and now it complements the
vvar_mapping variable.

There is still a problem for remapping the vDSO in glibc
applications: the linker relocates addresses for syscalls
on the vDSO page, so you need to relink with the new
addresses.

Without that the next syscall through glibc may fail:

  Program received signal SIGSEGV, Segmentation fault.
  #0  0xf7fd9b80 in __kernel_vsyscall ()
  #1  0xf7ec8238 in _exit () from /usr/lib32/libc.so.6

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 14:17:51 +02:00
Jiri Kosina
39380b80d7 x86/mm/pat, /dev/mem: Remove superfluous error message
Currently it's possible for broken (or malicious) userspace to flood a
kernel log indefinitely with messages a-la

	Program dmidecode tried to access /dev/mem between f0000->100000

because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on
dumps this information each and every time devmem_is_allowed() fails.

Reportedly userspace that is able to trigger contignuous flow of these
messages exists.

It would be possible to rate limit this message, but that'd have a
questionable value; the administrator wouldn't get information about all
the failing accessess, so then the information would be both superfluous
and incomplete at the same time :)

Returning EPERM (which is what is actually happening) is enough indication
for userspace what has happened; no need to log this particular error as
some sort of special condition.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pm
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:52:58 +02:00
Ingo Molnar
946e0f6ffc Linux 4.7-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXefulAAoJEHm+PkMAQRiG6nMH/2O1vcZeOtqmx2yCMUeXyKAT
 wG88XflXzf3rM7C7TiObEYVf/bbLleJ7saDLEeic7ButD5gyYacIuzylVnrcqfBc
 vinz4cOw5kvu9DrRkCKdOfiTAgwYtqQW+syJ8ZK4lPQuSxnPAs+F/FKSOpyUF5FN
 Dngr520KjYKBEtn27W9UDPChFRwQoWAlaOC534eusaArCJtHGHHiuq5TEDn2EIo8
 pUw2vwx5JiquSHOY34WLU7r+QoilovCQlUSsBQdLlPjfMB1QFtclPYa+5yEMjkT4
 wusOUOfS/zK0rV6KnEdc/SkpiVX5C9WpFiWUOdEeJ5mZ+KijVkaOa9K1EDx8jSM=
 =7Hwh
 -----END PGP SIGNATURE-----

Merge tag 'v4.7-rc6' into x86/mm, to merge fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:51:28 +02:00
Borislav Petkov
81c2949f7f x86/dumpstack: Add show_stack_regs() and use it
Add a helper to dump supplied pt_regs and use it in the MSR exception
handling code to have precise stack traces pointing to the actual
function causing the MSR access exception and not the stack frame of the
exception handler itself.

The new output looks like this:

 unchecked MSR access error: RDMSR from 0xdeadbeef at rIP: 0xffffffff8102ddb6 (early_init_intel+0x16/0x3a0)
  00000000756e6547 ffffffff81c03f68 ffffffff81dd0940 ffffffff81c03f10
  ffffffff81d42e65 0000000001000000 ffffffff81c03f58 ffffffff81d3e5a3
  0000800000000000 ffffffff81800080 ffffffffffffffff 0000000000000000
 Call Trace:
  [<ffffffff81d42e65>] early_cpu_init+0xe7/0x136
  [<ffffffff81d3e5a3>] setup_arch+0xa5/0x9df
  [<ffffffff81d38bb9>] start_kernel+0x9f/0x43a
  [<ffffffff81d38294>] x86_64_start_reservations+0x2f/0x31
  [<ffffffff81d383fe>] x86_64_start_kernel+0x168/0x176

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1467671487-10344-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:33:19 +02:00
Andy Lutomirski
ef16dd0c2a x86/dumpstack: Honor supplied @regs arg
The comment suggests that show_stack(NULL, NULL) should backtrace the
current context, but the code doesn't match the comment. If regs are
given, start the "Stack:" hexdump at regs->sp.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1467671487-10344-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/efcd79bf4106d61f1cd258c2caa87f3a0618eeac.1466036668.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:33:18 +02:00
Borislav Petkov
eb06158ee1 x86/microcode: Remove unused symbol exports
It is not a module anymore and those can be retracted.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160704170551.GC7261@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:30:58 +02:00
Ingo Molnar
846c791bf7 Linux 4.7-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXefulAAoJEHm+PkMAQRiG6nMH/2O1vcZeOtqmx2yCMUeXyKAT
 wG88XflXzf3rM7C7TiObEYVf/bbLleJ7saDLEeic7ButD5gyYacIuzylVnrcqfBc
 vinz4cOw5kvu9DrRkCKdOfiTAgwYtqQW+syJ8ZK4lPQuSxnPAs+F/FKSOpyUF5FN
 Dngr520KjYKBEtn27W9UDPChFRwQoWAlaOC534eusaArCJtHGHHiuq5TEDn2EIo8
 pUw2vwx5JiquSHOY34WLU7r+QoilovCQlUSsBQdLlPjfMB1QFtclPYa+5yEMjkT4
 wusOUOfS/zK0rV6KnEdc/SkpiVX5C9WpFiWUOdEeJ5mZ+KijVkaOa9K1EDx8jSM=
 =7Hwh
 -----END PGP SIGNATURE-----

Merge tag 'v4.7-rc6' into x86/microcode, to pick up fixes before merging new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:30:40 +02:00
Borislav Petkov
38c54ccb2d x86/mce: Fix mce_rdmsrl() warning message
The MSR address we're dumping in there should be in hex, otherwise we
get funsies like:

  [    0.016000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:428 mce_rdmsrl+0xd9/0xe0
  [    0.016000] mce: Unable to read msr -1073733631!
				       ^^^^^^^^^^^

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1467968983-4874-5-git-send-email-bp@alien8.de
[ Fixed capitalization of 'MSR'. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:29:26 +02:00
Yazen Ghannam
340e983ab8 x86/RAS/AMD: Reduce the number of IPIs when prepping error injection
We currently use wrmsr_on_cpu() 4 times when prepping for an error
injection. This will generate 4 IPIs for each MSR write. We can reduce
the number of IPIs to 1 by grouping the MSR writes and executing them
serially on the appropriate CPU.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1467968983-4874-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:29:26 +02:00
Aravind Gopalakrishnan
955d1427a9 x86/mce/AMD: Increase size of the bank_map type
Change bank_map type from 'char' to 'int' since we now have more than eight
banks in a system.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1467968983-4874-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:29:25 +02:00
Ingo Molnar
bac931259a Linux 4.7-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXefulAAoJEHm+PkMAQRiG6nMH/2O1vcZeOtqmx2yCMUeXyKAT
 wG88XflXzf3rM7C7TiObEYVf/bbLleJ7saDLEeic7ButD5gyYacIuzylVnrcqfBc
 vinz4cOw5kvu9DrRkCKdOfiTAgwYtqQW+syJ8ZK4lPQuSxnPAs+F/FKSOpyUF5FN
 Dngr520KjYKBEtn27W9UDPChFRwQoWAlaOC534eusaArCJtHGHHiuq5TEDn2EIo8
 pUw2vwx5JiquSHOY34WLU7r+QoilovCQlUSsBQdLlPjfMB1QFtclPYa+5yEMjkT4
 wusOUOfS/zK0rV6KnEdc/SkpiVX5C9WpFiWUOdEeJ5mZ+KijVkaOa9K1EDx8jSM=
 =7Hwh
 -----END PGP SIGNATURE-----

Merge tag 'v4.7-rc6' into ras/core, to pick up fixes before merging new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:29:07 +02:00
Andy Shevchenko
e81e11bc71 x86/platform/intel-mid: Enable spidev on Intel Edison boards
Intel Edison board provides one of the SPI bus for user's connected devices.

Append platform data to get spidev enumerated over it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dan O'Donovan <dan@emutex.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1467677690-90007-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:15:34 +02:00
Andy Shevchenko
ca22312dc8 x86/platform/intel-mid: Extend PWRMU to support Penwell
Intel Penwell is one of the first SoCs in Intel MID series. It has slightly
older version of PWRMU IP, though it is compatible with one found on Intel
Tangier. Since we are not using (yet) any advanced stuff in the driver we may
safely re-use what it's done for Intel Tangier for now.

Extend PWRMU driver to support Intel Penwell by adding PCI ID and re-using
existing ->set_initial_state() function.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1467749348-100518-2-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:00:06 +02:00
Andy Shevchenko
e99a0745bd x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
Intel MID platforms (Moorestown, Medfield, Clovertrail, Merrifield) are
sharing the code in the intel_mid_pci.c module. There is no need to
power off specific Moorestown devices after the following commit:

  5823d0893e ("x86/platform/intel-mid: Add Power Management Unit driver")

... because the condition in mrfld_power_off_dev() is true for any platform
from the above list.

Remove duplicate power off certain devices on Intel Moorestown and rename
the affected functions to show that they are applied to any of Intel MID
platforms.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1467749348-100518-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:00:05 +02:00
Linus Torvalds
7ed18e2d1b ACPI fixes for v4.7-rc7
- Fix a lock ordering issue in ACPICA introduced by a recent commit
    that attempted to fix a deadlock in the dynamic table loading code
    which in turn appeared after changes related to the handling of
    module-level AML also made in this cycle (Lv Zheng).
 
  - Fix a recent regression in the ACPI IRQ management code that may
    cause PCI drivers to be unable to register an IRQ if that IRQ
    happens to be shared with a device on the ISA bus, like the
    parallel port, by reverting one commit entirely and restoring the
    previous behavior in two other places (Sinan Kaya).
 
  - Fix a recent regression in the ACPI AML debugger introduced by
    the commit that removed incorrect usage of IS_ERR_VALUE() from
    multiple places (Lv Zheng).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXfuIeAAoJEILEb/54YlRxWbIP/iLT8J15RhFOoRYX4HXyULym
 UxeqEHVySKaqQLYhkBRz0opclW6sJQ7GaKYpVR31V4eeoll1+iQeYBYWdyuezjwp
 em3LvtLXTiLWWfRVNHg9AUOv4wc4q41m98MZ5ZScgTsUcexv2R1tt0KzLE/HNy1T
 NU/7JyWBEF4AiFsfYBuqtknWudV7JF3/siJO+Q1o+HSA9DW5cdqR0K9oM+Sl9pTD
 3yLpVY8DNJGLSA/7MyJlyLmZ6eJmTQbxcLO7oCQliqJ9jjBKkC4r3fgfQcOcoltT
 S6O7ODcJwvqCA1sv271VyOckkhJmyT7zJi/L/yJOgiGRU1//0Vfyg6iRVou9OZmE
 VcQT79W+6W6saWucoNDX1CLm6GzLIHP2e6leooL710nNmtTUgrSc6C4FT1KGGs0R
 UTNPYKYtiaK54krsa48XonSCdGFIszeK8sa3DIjf6C/5AYX/eGBuGFm5dZlU/qzs
 BNv+GyT/Vt/Cu3cNJUrsMugwDL5sp31u44mfM1c99LMXmnwzkPylRGR1aaO4TYOy
 jE5LDwMryi5gpAF8/Es0AiU4Xa8O3p+AlD7PjZIUSEyMd9zr3fdpvF6dZwaiHP26
 FX6paRbXi10mkXYGnRCbTOBV8i3PuNiplaZUT+XotHlEXOvAyMhVCVSM2XCAo1Wm
 3IWSGzc5gwq1Pl4fihWi
 =vKn9
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "All of these fix recent regressions in ACPICA, in the ACPI PCI IRQ
  management code and in the ACPI AML debugger.

  Specifics:

   - Fix a lock ordering issue in ACPICA introduced by a recent commit
     that attempted to fix a deadlock in the dynamic table loading code
     which in turn appeared after changes related to the handling of
     module-level AML also made in this cycle (Lv Zheng).

   - Fix a recent regression in the ACPI IRQ management code that may
     cause PCI drivers to be unable to register an IRQ if that IRQ
     happens to be shared with a device on the ISA bus, like the
     parallel port, by reverting one commit entirely and restoring the
     previous behavior in two other places (Sinan Kaya).

   - Fix a recent regression in the ACPI AML debugger introduced by the
     commit that removed incorrect usage of IS_ERR_VALUE() from multiple
     places (Lv Zheng)"

* tag 'acpi-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / debugger: Fix regression introduced by IS_ERR_VALUE() removal
  ACPICA: Namespace: Fix namespace/interpreter lock ordering
  ACPI,PCI,IRQ: separate ISA penalty calculation
  Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"
  ACPI,PCI,IRQ: factor in PCI possible
2016-07-07 20:49:41 -07:00
Rafael J. Wysocki
b6d90158c9 Merge branches 'acpica-fixes', 'acpi-pci-fixes' and 'acpi-debug-fixes'
* acpica-fixes:
  ACPICA: Namespace: Fix namespace/interpreter lock ordering

* acpi-pci-fixes:
  ACPI,PCI,IRQ: separate ISA penalty calculation
  Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"
  ACPI,PCI,IRQ: factor in PCI possible

* acpi-debug-fixes:
  ACPI / debugger: Fix regression introduced by IS_ERR_VALUE() removal
2016-07-07 23:37:37 +02:00
Rafael J. Wysocki
7fe39a2155 Merge branches 'pm-cpuidle-fixes' and 'pm-sleep-fixes'
* pm-cpuidle-fixes:
  cpuidle: Fix last_residency division

* pm-sleep-fixes:
  x86/power/64: Fix kernel text mapping corruption during image restoration
2016-07-07 23:17:20 +02:00
James Hogan
54b880caf1 kbuild, x86: Track generated headers with generated-y
Track generated header files which aren't already in genhdr-y, alongside
generic-y wrappers in the */include/generated/[uapi/]asm/ directories.
Currently only x86 generates extra headers in these directories, for the
purposes of enumerating system calls for different ABIs, and xen
hypercalls.

This will allow the asm-generic wrapper handling code to remove stale
wrappers when files are removed from generic-y, without also removing
these headers which are generated separately.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-kbuild@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: Michal Marek <mmarek@suse.com>
Link: http://lkml.kernel.org/r/1466808144-23209-2-git-send-email-james.hogan@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-07 15:58:44 +02:00
Srinivas Pandruvada
a0c9b8cc43 x86: remove duplicate turbo ratio limit MSRs
Remove MSR_NHM_TURBO_RATIO_LIMIT and MSR_IVT_TURBO_RATIO_LIMIT as
they are duplicate.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-07 15:32:00 +02:00
Rafael J. Wysocki
5ae0553fcb Merge branch 'pm-cpufreq' into pm-cpu 2016-07-07 15:31:29 +02:00
Thomas Gleixner
f9c287ba38 timers, x86/mce: Initialize MCE restart timer as pinned
Pinned timers must carry the pinned attribute in the timer structure
itself, so convert the code to the new API.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.215783439@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:25:14 +02:00
Thomas Gleixner
920a4a70c5 timers, x86/apic/uv: Initialize the UV heartbeat timer as pinned
Pinned timers must carry the pinned attribute in the timer structure
itself, so convert the code to the new API.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.133837204@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:25:14 +02:00
Kan Liang
46866b59df perf/x86/intel/uncore: Add support for the Intel Skylake client uncore PMU
This patch adds full support for Intel SKL client uncore PMU:

 - Add support for SKL client CPU uncore PMU, which is similar to the
   BDW client PMU driver. (There are some differences in CBOX numbering
   and uncore control MSR.)
 - Add new support for SkyLake Mobile uncore PMUs, for both CPU and PCI
   uncore functionality.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1467208912-8179-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 09:03:29 +02:00
Peter Zijlstra
aefbc4d04c perf/x86/intel: Fix rdlbr_to() MSR reading typo
It helps to actually read the right MSR..

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Fixes: d4cf1949f9 ("perf/x86/intel: Add {rd,wr}lbr_{to,from} wrappers")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 09:03:28 +02:00
Ingo Molnar
3ebe3bd8fb Merge branch 'perf/urgent' into perf/core, to pick up fixes before merging new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 08:58:23 +02:00
Ville Syrjälä
175a20c16f x86/perf/intel/rapl: Fix module name collision with powercap intel-rapl
Since commit 4b6e2571bf the rapl perf module calls itself intel-rapl. That
name was already in use by the rapl powercap driver, which now fails to load
if the perf module is loaded. Fix the problem by renaming the perf module to
intel-rapl-perf, so that both modules can coexist.

Fixes: 4b6e2571bf ("x86/perf/intel/rapl: Make the Intel RAPL PMU driver modular")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1466694409-3620-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-06 12:51:59 +02:00
Amitoj Kaur Chawla
585423c8c4 x86/xen: Use DIV_ROUND_UP
The kernel.h macro DIV_ROUND_UP performs the computation
(((n) + (d) - 1) /(d)) but is perhaps more readable.

The Coccinelle script used to make this change is as follows:
@haskernel@
@@

#include <linux/kernel.h>

@depends on haskernel@
expression n,d;
@@

(
- (n + d - 1) / d
+ DIV_ROUND_UP(n,d)
|
- (n + (d - 1)) / d
+ DIV_ROUND_UP(n,d)
)

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-06 10:34:49 +01:00
Boris Ostrovsky
6ab9507ed9 xen/PMU: Log VPMU initialization error at lower level
This will match how PMU errors are reported at check_hw_exists()'s
msr_fail label, which is reached when VPMU initialzation fails.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-06 10:34:48 +01:00
Juergen Gross
ecb23dc6f2 xen: add steal_clock support on x86
The pv_time_ops structure contains a function pointer for the
"steal_clock" functionality used only by KVM and Xen on ARM. Xen on x86
uses its own mechanism to account for the "stolen" time a thread wasn't
able to run due to hypervisor scheduling.

Add support in Xen arch independent time handling for this feature by
moving it out of the arm arch into drivers/xen and remove the x86 Xen
hack.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-06 10:34:48 +01:00
Shannon Zhao
a62ed50030 XEN: EFI: Move x86 specific codes to architecture directory
Move x86 specific codes to architecture directory and export those EFI
runtime service functions. This will be useful for initializing runtime
service on ARM later.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2016-07-06 10:34:46 +01:00
Shannon Zhao
243848fc01 xen/grant-table: Move xlated_setup_gnttab_pages to common place
Move xlated_setup_gnttab_pages to common place, so it can be reused by
ARM to setup grant table.

Rename it to xen_xlate_map_ballooned_pages.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
2016-07-06 10:34:42 +01:00
Alexis Dambricourt
30b072ce03 KVM: MTRR: fix kvm_mtrr_check_gfn_range_consistency page fault
The following #PF may occurs:
[ 1403.317041] BUG: unable to handle kernel paging request at 0000000200000068
[ 1403.317045] IP: [<ffffffffc04c20b0>] __mtrr_lookup_var_next+0x10/0xa0 [kvm]

[ 1403.317123] Call Trace:
[ 1403.317134]  [<ffffffffc04c2a65>] ? kvm_mtrr_check_gfn_range_consistency+0xc5/0x120 [kvm]
[ 1403.317143]  [<ffffffffc04ac11f>] ? tdp_page_fault+0x9f/0x2c0 [kvm]
[ 1403.317152]  [<ffffffffc0498128>] ? kvm_set_msr_common+0x858/0xc00 [kvm]
[ 1403.317161]  [<ffffffffc04b8883>] ? x86_emulate_insn+0x273/0xd30 [kvm]
[ 1403.317171]  [<ffffffffc04c04e4>] ? kvm_cpuid+0x34/0x190 [kvm]
[ 1403.317180]  [<ffffffffc04a5bb9>] ? kvm_mmu_page_fault+0x59/0xe0 [kvm]
[ 1403.317183]  [<ffffffffc0d729e1>] ? vmx_handle_exit+0x1d1/0x14a0 [kvm_intel]
[ 1403.317185]  [<ffffffffc0d75f3f>] ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
[ 1403.317187]  [<ffffffffc0d7621d>] ? vmx_vcpu_run+0x2ad/0x420 [kvm_intel]
[ 1403.317196]  [<ffffffffc04a0962>] ? kvm_arch_vcpu_ioctl_run+0x622/0x1550 [kvm]
[ 1403.317204]  [<ffffffffc049abb9>] ? kvm_arch_vcpu_load+0x59/0x210 [kvm]
[ 1403.317206]  [<ffffffff81036245>] ? __kernel_fpu_end+0x35/0x100
[ 1403.317213]  [<ffffffffc0487eb6>] ? kvm_vcpu_ioctl+0x316/0x5d0 [kvm]
[ 1403.317215]  [<ffffffff81088225>] ? do_sigtimedwait+0xd5/0x220
[ 1403.317217]  [<ffffffff811f84dd>] ? do_vfs_ioctl+0x9d/0x5c0
[ 1403.317224]  [<ffffffffc04928ae>] ? kvm_on_user_return+0x3e/0x70 [kvm]
[ 1403.317225]  [<ffffffff811f8a74>] ? SyS_ioctl+0x74/0x80
[ 1403.317227]  [<ffffffff815bf0b6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
[ 1403.317242] RIP  [<ffffffffc04c20b0>] __mtrr_lookup_var_next+0x10/0xa0 [kvm]

At mtrr_lookup_fixed_next(), when the condition
'if (iter->index >= ARRAY_SIZE(iter->mtrr_state->fixed_ranges))' becomes true,
mtrr_lookup_var_start() is called with iter->range with gargabe values from the
fixed MTRR union field. Then, list_prepare_entry() do not call list_entry()
initialization, keeping a garbage pointer in iter->range which is accessed in
the following __mtrr_lookup_var_next() call.

Fixes: f571c0973e
Signed-off-by: Alexis Dambricourt <alexis@blade-group.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 16:14:43 +02:00
Wei Yongjun
03f6a22a39 KVM: x86: Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element
Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 14:41:45 +02:00
Rafael J. Wysocki
5d1191ab6c Merge back earlier cpufreq material for v4.8. 2016-07-04 13:21:43 +02:00
Thomas Gleixner
8658be133b Merge branch 'irq/for-block' into irq/core
Pull the irq affinity managing code which is in a seperate branch for block
developers to pull.
2016-07-04 12:26:05 +02:00
Thomas Gleixner
06ee6d571f genirq: Add affinity hint to irq allocation
Add an extra argument to the irq(domain) allocation functions, so we can hand
down affinity hints to the allocator. Thats necessary to implement proper
support for multiqueue devices.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-block@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: axboe@fb.com
Cc: agordeev@redhat.com
Link: http://lkml.kernel.org/r/1467621574-8277-4-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-04 12:25:13 +02:00
Josh Poimboeuf
fc18822510 perf/x86: Fix 32-bit perf user callgraph collection
A basic perf callgraph record operation causes an immediate panic on a
32-bit kernel compiled with CONFIG_CC_STACKPROTECTOR=y:

  $ perf record -g ls
  Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: c0404fbd

  CPU: 0 PID: 998 Comm: ls Not tainted 4.7.0-rc5+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
   c0dd5967 ff7afe1c 00000086 f41dbc2c c07445a0 464c457f f41dbca8 f41dbc44
   c05646f4 f41dbca8 464c457f f41dbca8 464c457f f41dbc54 c04625be c0ce56fc
   c0404fbd f41dbc88 c0404fbd b74668f0 f41dc000 00000000 c0000000 00000000
  Call Trace:
   [<c07445a0>] dump_stack+0x58/0x78
   [<c05646f4>] panic+0x8e/0x1c6
   [<c04625be>] __stack_chk_fail+0x1e/0x30
   [<c0404fbd>] ? perf_callchain_user+0x22d/0x230
   [<c0404fbd>] perf_callchain_user+0x22d/0x230
   [<c055f89f>] get_perf_callchain+0x1ff/0x270
   [<c055f988>] perf_callchain+0x78/0x90
   [<c055c7eb>] perf_prepare_sample+0x24b/0x370
   [<c055c934>] perf_event_output_forward+0x24/0x70
   [<c05531c0>] __perf_event_overflow+0xa0/0x210
   [<c0550a93>] ? cpu_clock_event_read+0x43/0x50
   [<c0553431>] perf_swevent_hrtimer+0x101/0x180
   [<c0456235>] ? kmap_atomic_prot+0x35/0x140
   [<c056dc69>] ? get_page_from_freelist+0x279/0x950
   [<c058fdd8>] ? vma_interval_tree_remove+0x158/0x230
   [<c05939f4>] ? wp_page_copy.isra.82+0x2f4/0x630
   [<c05a050d>] ? page_add_file_rmap+0x1d/0x50
   [<c0565611>] ? unlock_page+0x61/0x80
   [<c0566755>] ? filemap_map_pages+0x305/0x320
   [<c059769f>] ? handle_mm_fault+0xb7f/0x1560
   [<c074cbeb>] ? timerqueue_del+0x1b/0x70
   [<c04cfefe>] ? __remove_hrtimer+0x2e/0x60
   [<c04d017b>] __hrtimer_run_queues+0xcb/0x2a0
   [<c0553330>] ? __perf_event_overflow+0x210/0x210
   [<c04d0a2a>] hrtimer_interrupt+0x8a/0x180
   [<c043ecc2>] local_apic_timer_interrupt+0x32/0x60
   [<c043f643>] smp_apic_timer_interrupt+0x33/0x50
   [<c0b0cd38>] apic_timer_interrupt+0x34/0x3c
  Kernel Offset: disabled
  ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: c0404fbd

The panic is caused by the fact that perf_callchain_user() mistakenly
assumes it's 64-bit only and ends up corrupting the stack.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.5+
Fixes: 75925e1ad7 ("perf/x86: Optimize stack walk user accesses")
Link: http://lkml.kernel.org/r/1a547f5077ec30f75f9b57074837c3c80df86e5e.1467432113.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-03 10:43:00 +02:00
Stephane Eranian
9010ae4a8d perf/x86/intel: Update event constraints when HT is off
This patch updates the event constraints for non-PEBS mode for
Intel Broadwell and Skylake processors. When HT is off, each
CPU gets 8 generic counters. However, not all events can be
programmed on any of the 8 counters.  This patch adds the
constraints for the MEM_* events which can only be measured on the
bottom 4 counters. The constraints are also valid when HT is off
because, then, there are only 4 generic counters and they are the
bottom counters.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1467411742-13245-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-03 10:39:53 +02:00
Dave Airlie
542d972221 Linux 4.7-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXcHi9AAoJEHm+PkMAQRiGSJ0H/2o4t9VWYmhyPC1sdIHoCExJ
 P4tBrcZYBmKcsOmIfnJDa5g/+IdhouEUM0v0fHPogS2UUWT9eRuJWYD3sY+HpEQ+
 heKTli8X73gsFB25odeIbIt0jAoSiiMYWDrWqLNsuUV1tjEYVA8rH0SM94FiOC/5
 7WVWXLTuH+Rm7JHP18BnKxmMMbzrTFmwisLMqFKyfZRRSlS+/ix7iLUNO9AFa39B
 YHxNPihLrZ0oONyCOAQoHTIXXrw0cQbxV2utg3vnMcCZdme2xOn+iXMntTSKfZ39
 iC9/T0vsO3R6OrRo2aDZAnCPUAniXnMEIhrKG37WMyXpj6cucZ/2QiNXcXviGV4=
 =iLte
 -----END PGP SIGNATURE-----

Back-merge tag 'v4.7-rc5' into drm-next

Linux 4.7-rc5

The fsl-dcu pull needs -rc3 so go to -rc5 for now.
2016-07-02 15:56:01 +10:00
Sinan Kaya
487cf917ed Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"
Trying to make the ISA and PCI init functionality common turned out
to be a bad idea, because the ISA path depends on external
functionality.

Restore the previous behavior and limit the refactoring to PCI
interrupts only.

Fixes: 1fcb6a813c "ACPI,PCI,IRQ: remove redundant code in acpi_irq_penalty_init()"
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Tested-by: Wim Osterholt <wim@djo.tudelft.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-02 01:38:34 +02:00
Herbert Xu
02fa472afe crypto: aesni - Use crypto_cipher to derive rfc4106 subkey
Currently aesni uses an async ctr(aes) to derive the rfc4106
subkey, which was presumably copied over from the generic rfc4106
code.  Over there it's done that way because we already have a
ctr(aes) spawn.  But it is simply overkill for aesni since we
have to go get a ctr(aes) from scratch anyway.

This patch simplifies the subkey derivation by using a straight
aes cipher instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-01 23:45:06 +08:00
Wanpeng Li
196f20ca52 KVM: vmx: fix missed cancellation of TSC deadline timer
INFO: rcu_sched detected stalls on CPUs/tasks:
 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663
 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
Task dump for CPU 1:
qemu-system-x86 R  running task        0  3529   3525 0x00080808
 ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
 ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
Call Trace:
? kvm_write_guest_cached+0xb9/0x160 [kvm]
? __delay+0xf/0x20
? wait_lapic_expire+0x14a/0x200 [kvm]
? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0x5/0x210
? do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
? SyS_ioctl+0x79/0x90
? do_syscall_64+0x7c/0x1e0
? entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced readily by running a full dynticks guest(since hrtimer
in guest is heavily used) w/ lapic_timer_advance disabled.

If fail to program hardware preemption timer, we will fallback to hrtimer based
method, however, a previous programmed preemption timer miss to cancel in this
scenario which results in one hardware preemption timer and one hrtimer emulated
tsc deadline timer run simultaneously. So sometimes the target guest deadline
tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer
can underflow and cause delta_tsc to be set a huge value, then host soft lockup
as above.

This patch fix it by cancelling the previous programmed preemption timer if there
is once we failed to program the new preemption timer and fallback to hrtimer
based method.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:42 +02:00
Wanpeng Li
bd97ad0e7e KVM: x86: introduce cancel_hv_tscdeadline
Introduce cancel_hv_tscdeadline() to encapsulate preemption
timer cancel stuff.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:41 +02:00
Paolo Bonzini
9175d2e97b KVM: vmx: fix underflow in TSC deadline calculation
If the TSC deadline timer is programmed really close to the deadline or
even in the past, the computation in vmx_set_hv_timer can underflow and
cause delta_tsc to be set to a huge value.  This generally results
in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
delta_tsc to be positive or zero.

Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:39 +02:00
Paolo Bonzini
f2485b3e0c KVM: x86: use guest_exit_irqoff
This gains a few clock cycles per vmexit.  On Intel there is no need
anymore to enable the interrupts in vmx_handle_external_intr, since
we are using the "acknowledge interrupt on exit" feature.  AMD
needs to do that, and must be careful to avoid the interrupt shadow.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:38 +02:00
Paolo Bonzini
91fa0f8e9e KVM: x86: always use "acknowledge interrupt on exit"
This is necessary to simplify handle_external_intr in the next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:36 +02:00
Paolo Bonzini
6edaa5307f KVM: remove kvm_guest_enter/exit wrappers
Use the functions from context_tracking.h directly.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:21 +02:00
Andy Shevchenko
0519e8b4cb x86/platform/intel-mid: Add pinctrl for Intel Merrifield
Intel Merrifield uses a special address space reserved for Family-Level
Interface Shim (FLIS) that allows consumers to mux and configure pins.

Create a platform device for it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1467226894-107109-1-git-send-email-andriy.shevchenko@linux.intel.com
[ Fixed typo. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 10:12:39 +02:00
Dave Hansen
4b3b234f43 x86/cpu: Rename "WESTMERE2" family to "NEHALEM_G"
Len Brown noticed something was amiss in our INTEL_FAM6_*
definitions.  It seems like model 0x1F was a Nehalem part,
marketed as "Intel Core i7 and i5 Processors" (according to the
SDM).  But, although it was a Nehalem 0x1F had some uncore events
which were shared with Westmere.

Len also mentioned he thought it was called "Havendale", which
Wikipedia says was graphics-oriented and canceled:

	https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

So either way, it's probably not imporant what we call it, but
call it Nehalem to be accurate, and add a "G" since it seems
graphics-related.  If it were canceled that would be a good reason
why it's so sparsely and inconsistently referred to in the code.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160629192737.949C41A8@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 10:03:24 +02:00
Borislav Petkov
09c6c30e72 x86/amd_nb: Clean up init path
The initcall had unnecessary pr_notice() messages which are useless
noise on distro kernels.

Also, push the GART init error message where it belongs, *after* the
check whether the current hw we're loaded on, supports GART at all.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Battersby <tonyb@cybernetics.com>
Link: http://lkml.kernel.org/r/1466097230-5333-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 09:36:12 +02:00
Ingo Molnar
35c88cabb1 Merge branch 'x86/urgent' into x86/cpu, to pick up dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 09:35:49 +02:00
Borislav Petkov
1ead852dd8 x86/amd_nb: Fix boot crash on non-AMD systems
Fix boot crash that triggers if this driver is built into a kernel and
run on non-AMD systems.

AMD northbridges users call amd_cache_northbridges() and it returns
a negative value to signal that we weren't able to cache/detect any
northbridges on the system.

At least, it should do so as all its callers expect it to do so. But it
does return a negative value only when kmalloc() fails.

Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
users like amd64_edac, for example, which relies on it to know whether
it should load or not, gets loaded on systems like Intel Xeons where it
shouldn't.

Reported-and-tested-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1466097230-5333-2-git-send-email-bp@alien8.de
Link: https://lkml.kernel.org/r/5761BEB0.9000807@cybernetics.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 09:35:35 +02:00
Rafael J. Wysocki
65c0554b73 x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab (x86/mm: Set NX on gap between __ex_table
and rodata).

That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables.  Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.

The exact reason why commit ab76f7b4ab matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.

To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch.  Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.

Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too.  That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables.  Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.

With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it).  Update RESTORE_MAGIC
too to reflect the image header format change.

Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.

This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.

Fixes: ab76f7b4ab (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-30 23:57:15 +02:00
Dave Hansen
8eda072e9d x86/cpufeature: Add helper macro for mask check macros
Every time we add a word to our cpu features, we need to add
something like this in two places:

	(((bit)>>5)==16 && (1UL<<((bit)&31) & REQUIRED_MASK16))

The trick is getting the "16" in this case in both places.  I've
now screwed this up twice, so as pennance, I've come up with
this patch to keep me and other poor souls from doing the same.

I also commented the logic behind the bit manipulation showcased
above.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160629200110.1BA8949E@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-30 09:11:32 +02:00
Dave Hansen
1e61f78baf x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
x86 has two macros which allow us to evaluate some CPUID-based
features at compile time:

	REQUIRED_MASK_BIT_SET()
	DISABLED_MASK_BIT_SET()

They're both defined by having the compiler check the bit
argument against some constant masks of features.

But, when adding new CPUID leaves, we need to check new words
for these macros.  So make sure that those macros and the
REQUIRED_MASK* and DISABLED_MASK* get updated when necessary.

This looks kinda silly to have an open-coded value ("18" in
this case) open-coded in 5 places in the code.  But, we really do
need 5 places updated when NCAPINTS gets bumped, so now we just
force the issue.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160629200108.92466F6F@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-30 09:11:32 +02:00
Dave Hansen
6e17cb9c2d x86/cpufeature: Update cpufeaure macros
We had a new CPUID "NCAPINT" word added, but the REQUIRED_MASK and
DISABLED_MASK macros did not get updated.  Update them.

None of the features was needed in these masks, so there was no
harm, but we should keep them updated anyway.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160629200107.8D3C9A31@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-30 09:11:32 +02:00
Megha Dey
bee5cfd9f6 crypto: sha512-mb - Crypto computation (x4 AVX2)
This patch introduces the assembly routines to do SHA512 computation on
buffers belonging to several jobs at once. The assembly routines are
optimized with AVX2 instructions that have 4 data lanes and using AVX2
registers.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-28 16:06:40 +08:00
Megha Dey
2cdacb68d7 crypto: sha512-mb - Algorithm data structures
This patch introduces the data structures and prototypes of functions
needed for computing SHA512 hash using multi-buffer. Included are the
structures of the multi-buffer SHA512 job, job scheduler in C and x86
assembly.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-28 16:06:39 +08:00
Megha Dey
45691e2d9b crypto: sha512-mb - submit/flush routines for AVX2
This patch introduces the routines used to submit and flush buffers
belonging to SHA512 crypto jobs to the SHA512 multibuffer algorithm.
It is implemented mostly in assembly optimized with AVX2 instructions.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-28 16:06:38 +08:00
Megha Dey
8c603ff286 crypto: sha512-mb - SHA512 multibuffer job manager and glue code
This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several SHA512 jobs to the
multi-buffer algorithm. It also contains the flush routine that's called
by the crypto daemon to complete the job when no new jobs arrive before
the deadline of maximum latency of a SHA512 crypto job.

The SHA512 multi-buffer crypto algorithm is defined and initialized in this
patch.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-28 16:06:35 +08:00
Quentin Casasnovas
ff30ef40de KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.
I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was
getting a GP(0) on a rather unspecial vmread from Xen:

     (XEN) ----[ Xen-4.7.0-rc  x86_64  debug=n  Not tainted ]----
     (XEN) CPU:    1
     (XEN) RIP:    e008:[<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
     (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d1v0)
     (XEN) rax: ffff82d0801e6288   rbx: ffff83003ffbfb7c   rcx: fffffffffffab928
     (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff83000bdd0000
     (XEN) rbp: ffff83000bdd0000   rsp: ffff83003ffbfab0   r8:  ffff830038813910
     (XEN) r9:  ffff83003faf3958   r10: 0000000a3b9f7640   r11: ffff83003f82d418
     (XEN) r12: 0000000000000000   r13: ffff83003ffbffff   r14: 0000000000004802
     (XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4: 00000000001526e0
     (XEN) cr3: 000000003fc79000   cr2: 0000000000000000
     (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
     (XEN) Xen code around <ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450):
     (XEN)  00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00
     (XEN) Xen stack trace from rsp=ffff83003ffbfab0:

     ...

     (XEN) Xen call trace:
     (XEN)    [<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
     (XEN)    [<ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300
     (XEN)    [<ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60
     (XEN)    [<ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70
     (XEN)    [<ffff82d0801ccacb>] handle_mmio+0x2b/0xd0
     (XEN)    [<ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0
     (XEN)    [<ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0
     (XEN)    [<ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270
     (XEN)    [<ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0
     (XEN)    [<ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90
     (XEN)    [<ffff82d0801c8433>] hvm_do_resume+0x23/0x140
     (XEN)    [<ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140
     (XEN)    [<ffff82d080164aeb>] context_switch+0x13b/0xe40
     (XEN)    [<ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570
     (XEN)    [<ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90
     (XEN)    [<ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50
     (XEN)
     (XEN)
     (XEN) ****************************************
     (XEN) Panic on CPU 1:
     (XEN) GENERAL PROTECTION FAULT
     (XEN) [error_code=0000]
     (XEN) ****************************************

Tracing my host KVM showed it was the one injecting the GP(0) when
emulating the VMREAD and checking the destination segment permissions in
get_vmx_mem_address():

     3)               |    vmx_handle_exit() {
     3)               |      handle_vmread() {
     3)               |        nested_vmx_check_permission() {
     3)               |          vmx_get_segment() {
     3)   0.074 us    |            vmx_read_guest_seg_base();
     3)   0.065 us    |            vmx_read_guest_seg_selector();
     3)   0.066 us    |            vmx_read_guest_seg_ar();
     3)   1.636 us    |          }
     3)   0.058 us    |          vmx_get_rflags();
     3)   0.062 us    |          vmx_read_guest_seg_ar();
     3)   3.469 us    |        }
     3)               |        vmx_get_cs_db_l_bits() {
     3)   0.058 us    |          vmx_read_guest_seg_ar();
     3)   0.662 us    |        }
     3)               |        get_vmx_mem_address() {
     3)   0.068 us    |          vmx_cache_reg();
     3)               |          vmx_get_segment() {
     3)   0.074 us    |            vmx_read_guest_seg_base();
     3)   0.068 us    |            vmx_read_guest_seg_selector();
     3)   0.071 us    |            vmx_read_guest_seg_ar();
     3)   1.756 us    |          }
     3)               |          kvm_queue_exception_e() {
     3)   0.066 us    |            kvm_multiple_exception();
     3)   0.684 us    |          }
     3)   4.085 us    |        }
     3)   9.833 us    |      }
     3) + 10.366 us   |    }

Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
Control Structure", I found that we're enforcing that the destination
operand is NOT located in a read-only data segment or any code segment when
the L1 is in long mode - BUT that check should only happen when it is in
protected mode.

Shuffling the code a bit to make our emulation follow the specification
allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
without problems.

Fixes: f9eb4af67c ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Eugene Korenevsky <ekorenevsky@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:30:44 +02:00
Marcelo Tosatti
b606f189c7 KVM: LAPIC: cap __delay at lapic_timer_advance_ns
The host timer which emulates the guest LAPIC TSC deadline
timer has its expiration diminished by lapic_timer_advance_ns
nanoseconds. Therefore if, at wait_lapic_expire, a difference
larger than lapic_timer_advance_ns is encountered, delay at most
lapic_timer_advance_ns.

This fixes a problem where the guest can cause the host
to delay for large amounts of time.

Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:30:41 +02:00
Marcelo Tosatti
8d93c874ac KVM: x86: move nsec_to_cycles from x86.c to x86.h
Move the inline function nsec_to_cycles from x86.c to x86.h, as
the next patch uses it from lapic.c.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:30:38 +02:00
Minfei Huang
ed911b43ad pvclock: Get rid of __pvclock_read_cycles in function pvclock_read_flags
There is a generic function __pvclock_read_cycles to be used to get both
flags and cycles. For function pvclock_read_flags, it's useless to get
cycles value. To make this function be more effective, get this variable
flags directly in function.

Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:12:15 +02:00
Minfei Huang
f7550d076d pvclock: Cleanup to remove function pvclock_get_nsec_offset
Function __pvclock_read_cycles is short enough, so there is no need to
have another function pvclock_get_nsec_offset to calculate tsc delta.
It's better to combine it into function __pvclock_read_cycles.

Remove useless variables in function __pvclock_read_cycles.

Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:12:14 +02:00
Minfei Huang
749d088b8e pvclock: Add CPU barriers to get correct version value
Protocol for the "version" fields is: hypervisor raises it (making it
uneven) before it starts updating the fields and raises it again (making
it even) when it is done.  Thus the guest can make sure the time values
it got are consistent by checking the version before and after reading
them.

Add CPU barries after getting version value just like what function
vread_pvclock does, because all of callees in this function is inline.

Fixes: 502dfeff23
Cc: stable@vger.kernel.org
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:12:14 +02:00
Arnd Bergmann
b684e9bc75 x86/efi: Remove the unused efi_get_time() function
Nothing calls the efi_get_time() function on x86, but it does suffer
from the 32-bit time_t overflow in 2038.

This removes the function, we can always put it back in case we need
it later.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-8-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:58 +02:00
Alex Thorlton
21f866257c x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
Currently, the efi_thunk macro has some semi-duplicated code in it that
can be replaced with the arch_efi_call_virt_setup/teardown macros. This
commit simply replaces the duplicated code with those macros.

Suggested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-7-git-send-email-matt@codeblueprint.co.uk
[ Renamed variables to the standard __ prefix. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:57 +02:00
Alex Thorlton
d1be84a232 x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
Now that the efi_call_virt() macro has been generalized to be able to
use EFI system tables besides efi.systab, we are able to convert our
uv_bios_call() wrapper to use this standard EFI callback mechanism.

This simple change is part of a much larger effort to recover from some
issues with the way we were mapping in some of our MMRs, and the way
that we were doing our BIOS callbacks, which were uncovered by commit
67a9108ed4 ("x86/efi: Build our own page table structures").

The first issue that this uncovered was that we were relying on the EFI
memory mapping mechanism to map in our MMR space for us, which, while
reliable, was technically a bug, as it relied on "undefined" behavior in
the mapping code.

The reason we were able to piggyback on the EFI memory mapping code to
map in our MMRs was because, previously, EFI code used the
trampoline_pgd, which shares a few entries with the main kernel pgd.  It
just so happened, that the memory range containing our MMRs was inside
one of those shared regions, which kept our code working without issue
for quite a while.

Anyways, once we discovered this problem, we brought back our original
code to map in the MMRs with commit:

  08914f436b ("x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init")

This got our systems a little further along, but we were still running
into trouble with our EFI callbacks, which prevented us from booting
all the way up.

Our first step towards fixing the BIOS callbacks was to get our
uv_bios_call() wrapper updated to use efi_call_virt() instead of the plain
efi_call().  The previous patch took care of the effort needed to make
that possible.  Along the way, we hit a major issue with some confusion
about how to properly pull arguments higher than number 6 off the stack
in the efi_call() code, which resulted in the following commit from Linus:

  683ad8092c ("x86/efi: Fix 7-parameter efi_call()s")

Now that all of those issues are out of the way, we're able to make this
simple change to use the new efi_call_virt_pointer() in uv_bios_call()
which gets our machines booting, running properly, and able to execute our
callbacks with 6+ arguments.

Note that, since we are now using the EFI page table when we make our
function call, we are no longer able to make the call using the __va()
of our function pointer, since the memory range containing that address
isn't mapped into the EFI page table.  For now, we will use the physical
address of the function directly, since that is mapped into the EFI page
table.  In the near future, we're going to get some code added in to
properly update our function pointer to its virtual address during
SetVirtualAddressMap.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:56 +02:00
Alex Thorlton
80e7559607 efi: Convert efi_call_virt() to efi_call_virt_pointer()
This commit makes a few slight modifications to the efi_call_virt() macro
to get it to work with function pointers that are stored in locations
other than efi.systab->runtime, and renames the macro to
efi_call_virt_pointer().  The majority of the changes here are to pull
these macros up into header files so that they can be accessed from
outside of drivers/firmware/efi/runtime-wrappers.c.

The most significant change not directly related to the code move is to
add an extra "p" argument into the appropriate efi_call macros, and use
that new argument in place of the, formerly hard-coded,
efi.systab->runtime pointer.

The last piece of the puzzle was to add an efi_call_virt() macro back into
drivers/firmware/efi/runtime-wrappers.c to wrap around the new
efi_call_virt_pointer() macro - this was mainly to keep the code from
looking too cluttered by adding a bunch of extra references to
efi.systab->runtime everywhere.

Note that I also broke up the code in the efi_call_virt_pointer() macro a
bit in the process of moving it.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:56 +02:00
Colin Ian King
f6d1747f89 x86/efi: Remove unused variable 'efi'
Remove unused variable 'efi', it is never used. This fixes the following
clang build warning:

  arch/x86/boot/compressed/eboot.c:803:2: warning: Value stored to 'efi' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:55 +02:00
Borislav Petkov
dbf984d825 x86/boot/64: Add forgotten end of function marker
Add secondary_startup_64()'s ENDPROC() marker.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160625112457.16930-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 12:20:31 +02:00
Ingo Molnar
630741fb60 Merge branch 'sched/urgent' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:35:02 +02:00
Peter Zijlstra
d4cf1949f9 perf/x86/intel: Add {rd,wr}lbr_{to,from} wrappers
The whole rdmsr()/wrmsr() for lbr_from got a little unweildy with the
sign extension quirk, provide a few simple wrappers to clean things up.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:21 +02:00
David Carrillo-Cisneros
71adae99ed perf/x86/intel: Add MSR_LAST_BRANCH_FROM_x quirk for ctx switch
Add quirk for context switch to save/restore the value of
MSR_LAST_BRANCH_FROM_x when LBR is enabled and there is potential for
kernel addresses to be in the lbr_from register.

To test this patch, use a perf tool and kernel with the patch
next in this series. That patch removes the work around that masked
the hw bug:

  $ ./lbr_perf record --call-graph lbr -e cycles:k sleep 1

where lbr_perf is the patched perf tool, that allows to specify :k
on lbr mode. The above command will trigger a #GPF :

 WARNING: CPU: 28 PID: 14096 at arch/x86/mm/extable.c:65 ex_handler_wrmsr_unsafe+0x70/0x80
 unchecked MSR access error: WRMSR to 0x681 (tried to write 0x1fffffff81010794)
 ...
 Call Trace:
  [<ffffffff8167af49>] dump_stack+0x4d/0x63
  [<ffffffff810b9b15>] __warn+0xe5/0x100
  [<ffffffff810b9be9>] warn_slowpath_fmt+0x49/0x50
  [<ffffffff810abb40>] ex_handler_wrmsr_unsafe+0x70/0x80
  [<ffffffff810abc42>] fixup_exception+0x42/0x50
  [<ffffffff81079d1a>] do_general_protection+0x8a/0x160
  [<ffffffff81684ec2>] general_protection+0x22/0x30
  [<ffffffff810101b9>] ? intel_pmu_lbr_sched_task+0xc9/0x380
  [<ffffffff81009d7c>] intel_pmu_sched_task+0x3c/0x60
  [<ffffffff81003a2b>] x86_pmu_sched_task+0x1b/0x20
  [<ffffffff81192a5b>] perf_pmu_sched_task+0x6b/0xb0
  [<ffffffff8119746d>] __perf_event_task_sched_in+0x7d/0x150
  [<ffffffff810dd9dc>] finish_task_switch+0x15c/0x200
  [<ffffffff8167f894>] __schedule+0x274/0x6cc
  [<ffffffff8167fdd9>] schedule+0x39/0x90
  [<ffffffff81675398>] exit_to_usermode_loop+0x39/0x89
  [<ffffffff810028ce>] prepare_exit_to_usermode+0x2e/0x30
  [<ffffffff81683c1b>] retint_user+0x8/0x10

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-5-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:20 +02:00
David Carrillo-Cisneros
3812bba84f perf/x86/intel: Fix trivial formatting and style bug
Replace spaces by tabs in LBR_FROM_* constants to align with newly
defined constant. Use BIT_ULL.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-4-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:19 +02:00
David Carrillo-Cisneros
19fc9ddd61 perf/x86/intel: Fix MSR_LAST_BRANCH_FROM_x bug when no TSX
Intel's SDM states that bits 61:62 in MSR_LAST_BRANCH_FROM_x are the
TSX flags for formats with LBR_TSX flags (i.e. LBR_FORMAT_EIP_EFLAGS2).

However, when the CPU has TSX support deactivated, bits 61:62 actually
behave as follows:

  - For wrmsr(), bits 61:62 are considered part of the sign extension.
  - When capturing branches, the LBR hw will always clear bits 61:62.
    regardless of the sign extension.

Therefore, if:

  1) LBR has TSX format.
  2) CPU has no TSX support enabled.

... then any value passed to wrmsr() must be sign extended to 63 bits
and any value from rdmsr() must be converted to have a sign extension
of 61 bits, ignoring the values at TSX flags.

This bug was masked by the work-around to the Intel's CPU bug:
BJ94. "LBR May Contain Incorrect Information When Using FREEZE_LBRS_ON_PMI"
in Document Number: 324643-037US.

The aforementioned work-around uses hw flags to filter out all kernel
branches, limiting LBR callstack to user level execution only.

Since user addresses are not sign extended, they do not trigger the wrmsr()
bug in MSR_LAST_BRANCH_FROM_x when saved/restored at context switch.

To verify the hw bug:

  $ perf record -b -e cycles sleep 1
  $ rdmsr -p 0 0x680
  0x1fffffffb0b9b0cc
  $ wrmsr -p 0 0x680 0x1fffffffb0b9b0cc
  write(): Input/output error

The quirk for LBR_FROM_ MSRs is required before calls to wrmsrl() and
after rdmsrl().

This patch introduces it for wrmsrl()'s done for testing LBR support.

Future patch in series adds the quirk for context switch, that would
be required if LBR callstack is to be enabled for ring 0.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-3-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:19 +02:00
David Carrillo-Cisneros
f09509b939 perf/x86/intel: Print LBR support statement after validation
The following commit:

  338b522ca4 ("perf/x86/intel: Protect LBR and extra_regs against KVM lying")

added an additional test to LBR support detection that is performed after
printing the LBR support statement to dmesg.

Move the LBR support output after the very last test, to make sure we
print the true status of LBR support.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-2-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:18 +02:00
Ingo Molnar
8114e90ea4 Linux 4.7-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXcHi9AAoJEHm+PkMAQRiGSJ0H/2o4t9VWYmhyPC1sdIHoCExJ
 P4tBrcZYBmKcsOmIfnJDa5g/+IdhouEUM0v0fHPogS2UUWT9eRuJWYD3sY+HpEQ+
 heKTli8X73gsFB25odeIbIt0jAoSiiMYWDrWqLNsuUV1tjEYVA8rH0SM94FiOC/5
 7WVWXLTuH+Rm7JHP18BnKxmMMbzrTFmwisLMqFKyfZRRSlS+/ix7iLUNO9AFa39B
 YHxNPihLrZ0oONyCOAQoHTIXXrw0cQbxV2utg3vnMcCZdme2xOn+iXMntTSKfZ39
 iC9/T0vsO3R6OrRo2aDZAnCPUAniXnMEIhrKG37WMyXpj6cucZ/2QiNXcXviGV4=
 =iLte
 -----END PGP SIGNATURE-----

Merge tag 'v4.7-rc5' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:20:46 +02:00
Megha Dey
4c79f6f81a crypto: sha1-mb - rename sha-mb to sha1-mb
Until now, there was only support for the SHA1 multibuffer algorithm.
Hence, there was just one sha-mb folder. Now, with the introduction of
the SHA256 multi-buffer algorithm , it is logical to name the existing
folder as sha1-mb.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27 16:57:47 +08:00
Megha Dey
992532474f crypto: sha256-mb - Crypto computation (x8 AVX2)
This patch introduces the assembly routines to do SHA256 computation
on buffers belonging to several jobs at once.  The assembly routines
are optimized with AVX2 instructions that have 8 data lanes and using
AVX2 registers.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27 16:57:45 +08:00
Megha Dey
98cf10383a crypto: sha256-mb - Algorithm data structures
This patch introduces the data structures and prototypes of
functions needed for computing SHA256 hash using multi-buffer.
Included are the structures of the multi-buffer SHA256 job,
job scheduler in C and x86 assembly.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27 16:57:45 +08:00
Megha Dey
a377c6b187 crypto: sha256-mb - submit/flush routines for AVX2
This patch introduces the routines used to submit and flush buffers
belonging to SHA256 crypto jobs to the SHA256 multibuffer algorithm. It
is implemented mostly in assembly optimized with AVX2 instructions.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27 16:57:44 +08:00
Megha Dey
f876f440df crypto: sha256-mb - SHA256 multibuffer job manager and glue code
This patch introduces the multi-buffer job manager which is responsible for
submitting scatter-gather buffers from several SHA256 jobs to the
multi-buffer algorithm. It also contains the flush routine to that's
called by the crypto daemon to complete the job when no new jobs arrive
before the deadline of maximum latency of a SHA256 crypto job.

The SHA256 multi-buffer crypto algorithm is defined and initialized in
this patch.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27 16:57:41 +08:00
Yinghai Lu
e066cc4777 x86/KASLR: Allow randomization below the load address
Currently the kernel image physical address randomization's lower
boundary is the original kernel load address.

For bootloaders that load kernels into very high memory (e.g. kexec),
this means randomization takes place in a very small window at the
top of memory, ignoring the large region of physical memory below
the load address.

Since mem_avoid[] is already correctly tracking the regions that must be
avoided, this patch changes the minimum address to whatever is less:
512M (to conservatively avoid unknown things in lower memory) or the
load address. Now, for example, if the kernel is loaded at 8G, [512M,
8G) will be added to the list of possible physical memory positions.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote the changelog, refactored the code to use min(). ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org
[ Edited the changelog some more, plus the code comment as well. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:05 +02:00
Kees Cook
ed9f007ee6 x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
We want the physical address to be randomized anywhere between
16MB and the top of physical memory (up to 64TB).

This patch exchanges the prior slots[] array for the new slot_areas[]
array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
address offset for 64-bit. As before, process_e820_entry() walks
memory and populates slot_areas[], splitting on any detected mem_avoid
collisions.

Finally, since the slots[] array and its associated functions are not
needed any more, so they are removed.

Based on earlier patches by Baoquan He.

Originally-from: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:05 +02:00
Baoquan He
8391c73c96 x86/KASLR: Randomize virtual address separately
The current KASLR implementation randomizes the physical and virtual
addresses of the kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.

On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual address,
so it continues to use the physical offset for its virtual offset.

Additionally updates the sanity checks done on the resulting kernel
addresses since they are potentially separate now.

[kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:04 +02:00
Kees Cook
11fdf97a3c x86/KASLR: Clarify identity map interface
This extracts the call to prepare_level4() into a top-level function
that the user of the pagetable.c interface must call to initialize
the new page tables. For clarity and to match the "finalize" function,
it has been renamed to initialize_identity_maps(). This function also
gains the initialization of mapping_info so we don't have to do it each
time in add_identity_map().

Additionally add copyright notice to the top, to make it clear that the
bulk of the pagetable.c code was written by Yinghai, and that I just
added bugs later. :)

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:04 +02:00
Kees Cook
98f7852537 x86/boot: Refuse to build with data relocations
The compressed kernel is built with -fPIC/-fPIE so that it can run in any
location a bootloader happens to put it. However, since ELF relocation
processing is not happening (and all the relocation information has
already been stripped at link time), none of the code can use data
relocations (e.g. static assignments of pointers). This is already noted
in a warning comment at the top of misc.c, but this adds an explicit
check for the condition during the linking stage to block any such bugs
from appearing.

If this was in place with the earlier bug in pagetable.c, the build
would fail like this:

  ...
    CC      arch/x86/boot/compressed/pagetable.o
    DATAREL arch/x86/boot/compressed/vmlinux
  error: arch/x86/boot/compressed/pagetable.o has data relocations!
  make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
  ...

A clean build shows:

  ...
    CC      arch/x86/boot/compressed/pagetable.o
    DATAREL arch/x86/boot/compressed/vmlinux
    LD      arch/x86/boot/compressed/vmlinux
  ...

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:03 +02:00
Kees Cook
65fe935dd2 x86/KASLR, x86/power: Remove x86 hibernation restrictions
With the following fix:

  70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")

... there is no longer a problem with hibernation resuming a
KASLR-booted kernel image, so remove the restriction.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26 12:32:03 +02:00
Arnd Bergmann
d6faca40f4 rtc: move mc146818 helper functions out-of-line
The mc146818_get_time/mc146818_set_time functions are rather large
inline functions in a global header file and are used in several
drivers and in x86 specific code.

Here we move them into a separate .c file that is compiled whenever
any of the users require it. This also lets us remove the linux/acpi.h
header inclusion from mc146818rtc.h, which in turn avoids some
warnings about duplicate definition of the TRUE/FALSE macros.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-06-26 01:20:08 +02:00
Linus Torvalds
9a949a9859 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kprobe fix from Thomas Gleixner:
 "A single fix clearing the TF bit when a fault is single stepped"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes/x86: Clear TF bit in fault on single-stepping
2016-06-25 06:49:32 -07:00
Linus Torvalds
086e3eb65e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "Two weeks worth of fixes here"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
  autofs: don't get stuck in a loop if vfs_write() returns an error
  mm/page_owner: avoid null pointer dereference
  tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
  fs/nilfs2: fix potential underflow in call to crc32_le
  oom, suspend: fix oom_reaper vs. oom_killer_disable race
  ocfs2: disable BUG assertions in reading blocks
  mm, compaction: abort free scanner if split fails
  mm: prevent KASAN false positives in kmemleak
  mm/hugetlb: clear compound_mapcount when freeing gigantic pages
  mm/swap.c: flush lru pvecs on compound page arrival
  memcg: css_alloc should return an ERR_PTR value on error
  memcg: mem_cgroup_migrate() may be called with irq disabled
  hugetlb: fix nr_pmds accounting with shared page tables
  Revert "mm: disable fault around on emulated access bit architecture"
  Revert "mm: make faultaround produce old ptes"
  mailmap: add Boris Brezillon's email
  mailmap: add Antoine Tenart's email
  mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
  mm: mempool: kasan: don't poot mempool objects in quarantine
  ...
2016-06-24 19:08:33 -07:00
Linus Torvalds
032fd3e58c xen: bug fixes for 4.7-rc4
- Fix x86 PV dom0 crash during early boot on some hardware.
 - Fix two pciback bugs affects certain devices.
 - Fix potential overflow when clearing page tables in x86 PV.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXbTwWAAoJEFxbo/MsZsTRu7IH/1sAn6KFHfP2Px/Sydh/pxZH
 0oOW+2aZLVqu8BRiHj6YeQVRuhzdIgSoU9wMmCFX7rz1m6gq4c60cJF/lKYmlbxp
 0lyxbf+4451rh/qNVV3pm5J+w6R818Y2hoIOu2BK3ppJ4W8nXbW5kHHvtYQCXu0A
 mApSgMHBbWv6kkAxEuUMa5wOipENiAIYg+pFqwo+y9V8sS8zAqqHivct3T6ucNyV
 u/WB076QAnL8abcwKELXsyV5hmcfJv/CoMS9Qv6GwIv1z9d0UVS2+qoo1Qox2sAP
 79AoJn2E6p7rkb/HdhdSYjja22oct1ahrfSgCSBEwLNZCMc5srKdwK6Zspe5y+0=
 =qqrC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - fix x86 PV dom0 crash during early boot on some hardware

 - fix two pciback bugs affects certain devices

 - fix potential overflow when clearing page tables in x86 PV

* tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen-pciback: return proper values during BAR sizing
  x86/xen: avoid m2p lookup when setting early page table entries
  xen/pciback: Fix conf_space read/write overlap check.
  x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
  xen/balloon: Fix declared-but-not-defined warning
2016-06-24 17:57:37 -07:00
Michal Hocko
f58f230a83 x86/efi: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

efi_alloc_page_tables uses __GFP_REPEAT but it allocates an order-0
page.  This means that this flag has never been actually useful here
because it has always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-4-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
a3a9a59d20 x86: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
flag is for more than order-0.  This means that this flag has never been
actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-3-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
32d6bd9059 tree wide: get rid of __GFP_REPEAT for order-0 allocations part I
This is the third version of the patchset previously sent [1].  I have
basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
rid of superfluous gfp flags" which went through dm tree.  I am sending
it now because it is tree wide and chances for conflicts are reduced
considerably when we want to target rc2.  I plan to send the next step
and rename the flag and move to a better semantic later during this
release cycle so we will have a new semantic ready for 4.8 merge window
hopefully.

Motivation:

While working on something unrelated I've checked the current usage of
__GFP_REPEAT in the tree.  It seems that a majority of the usage is and
always has been bogus because __GFP_REPEAT has always been about costly
high order allocations while we are using it for order-0 or very small
orders very often.  It seems that a big pile of them is just a
copy&paste when a code has been adopted from one arch to another.

I think it makes some sense to get rid of them because they are just
making the semantic more unclear.  Please note that GFP_REPEAT is
documented as

* __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt

* _might_ fail.  This depends upon the particular VM implementation.
  while !costly requests have basically nofail semantic.  So one could
  reasonably expect that order-0 request with __GFP_REPEAT will not loop
  for ever.  This is not implemented right now though.

I would like to move on with __GFP_REPEAT and define a better semantic
for it.

  $ git grep __GFP_REPEAT origin/master | wc -l
  111
  $ git grep __GFP_REPEAT | wc -l
  36

So we are down to the third after this patch series.  The remaining
places really seem to be relying on __GFP_REPEAT due to large allocation
requests.  This still needs some double checking which I will do later
after all the simple ones are sorted out.

I am touching a lot of arch specific code here and I hope I got it right
but as a matter of fact I even didn't compile test for some archs as I
do not have cross compiler for them.  Patches should be quite trivial to
review for stupid compile mistakes though.  The tricky parts are usually
hidden by macro definitions and thats where I would appreciate help from
arch maintainers.

[1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org

This patch (of 19):

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.  Yet we
have the full kernel tree with its usage for apparently order-0
allocations.  This is really confusing because __GFP_REPEAT is
explicitly documented to allow allocation failures which is a weaker
semantic than the current order-0 has (basically nofail).

Let's simply drop __GFP_REPEAT from those places.  This would allow to
identify place which really need allocator to retry harder and formulate
a more specific semantic for what the flag is supposed to do actually.

Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Crispin <blogic@openwrt.org>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Linus Torvalds
aca9c293d0 x86: fix up a few misc stack pointer vs thread_info confusions
As the actual pointer value is the same for the thread stack allocation
and the thread_info, code that confused the two worked fine, but will
break when the thread info is moved away from the stack allocation.  It
also looks very confusing.

For example, the kprobe code wanted to know the current top of stack.
To do that, it used this:

	(unsigned long)current_thread_info() + THREAD_SIZE

which did indeed give the correct value.  But it's not only a fairly
nonsensical expression, it's also rather complex, especially since we
actually have this:

	static inline unsigned long current_top_of_stack(void)

which not only gives us the value we are interested in, but happens to
be how "current_thread_info()" is currently defined as:

	(struct thread_info *)(current_top_of_stack() - THREAD_SIZE);

so using current_thread_info() to figure out the top of the stack really
is a very round-about thing to do.

The other cases are just simpler confusion about task_thread_info() vs
task_stack_page(), which currently return the same pointer - but if you
want the stack page, you really should be using the latter one.

And there was one entirely unused assignment of the current stack to a
thread_info pointer.

All cleaned up to make more sense today, and make it easier to move the
thread_info away from the stack in the future.

No semantic changes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 16:55:53 -07:00
Linus Torvalds
da01e18a37 x86: avoid avoid passing around 'thread_info' in stack dumping code
None of the code actually wants a thread_info, it all wants a
task_struct, and it's just converting to a thread_info pointer much too
early.

No semantic change.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-23 12:20:01 -07:00
Arnd Bergmann
87aeb54f1b kvm: x86: use getboottime64
KVM reads the current boottime value as a struct timespec in order to
calculate the guest wallclock time, resulting in an overflow in 2038
on 32-bit systems.

The data then gets passed as an unsigned 32-bit number to the guest,
and that in turn overflows in 2106.

We cannot do much about the second overflow, which affects both 32-bit
and 64-bit hosts, but we can ensure that they both behave the same
way and don't overflow until 2106, by using getboottime64() to read
a timespec64 value.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:30 +02:00
Ashok Raj
c45dcc71b7 KVM: VMX: enable guest access to LMCE related MSRs
On Intel platforms, this patch adds LMCE to KVM MCE supported
capabilities and handles guest access to LMCE related MSRs.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
           Only enable LMCE on Intel platform
           Check MSR_IA32_FEATURE_CONTROL when handling guest
             access to MSR_IA32_MCG_EXT_CTL]
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:29 +02:00
Haozhong Zhang
37e4c997da KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
KVM currently does not check the value written to guest
MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features
may be set. This patch makes KVM to validate individual bits written to
guest MSR_IA32_FEATURE_CONTROL according to enabled features.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:29 +02:00
Haozhong Zhang
3b84080b95 KVM: VMX: move msr_ia32_feature_control to vcpu_vmx
msr_ia32_feature_control will be used for LMCE and not depend only on
nested anymore, so move it from struct nested_vmx to struct vcpu_vmx.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:28 +02:00
David Vrabel
d6b186c1e2 x86/xen: avoid m2p lookup when setting early page table entries
When page tables entries are set using xen_set_pte_init() during early
boot there is no page fault handler that could handle a fault when
performing an M2P lookup.

In 64 bit guests (usually dom0) early_ioremap() would fault in
xen_set_pte_init() because an M2P lookup faults because the MFN is in
MMIO space and not mapped in the M2P.  This lookup is done to see if
the PFN in in the range used for the initial page table pages, so that
the PTE may be set as read-only.

The M2P lookup can be avoided by moving the check (and clear of RW)
earlier when the PFN is still available.

Reported-by: Kevin Moraga <kmoragas@riseup.net>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
2016-06-23 14:50:30 +01:00
Juergen Gross
1cf3874130 x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
xen_cleanhighmap() is operating on level2_kernel_pgt only. The upper
bound of the loop setting non-kernel-image entries to zero should not
exceed the size of level2_kernel_pgt.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-23 11:36:15 +01:00
Megha Dey
331bf739c4 crypto: sha1-mb - async implementation for sha1-mb
Herbert wants the sha1-mb algorithm to have an async implementation:
https://lkml.org/lkml/2016/4/5/286.
Currently, sha1-mb uses an async interface for the outer algorithm
and a sync interface for the inner algorithm. This patch introduces
a async interface for even the inner algorithm.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-23 18:29:55 +08:00
Herbert Xu
7271b33cb8 crypto: ghash-clmulni - Fix cryptd reordering
This patch fixes an old bug where requests can be reordered because
some are processed by cryptd while others are processed directly
in softirq context.

The fix is to always postpone to cryptd if there are currently
requests outstanding from the same tfm.

This patch also removes the redundant use of cryptd in the async
init function as init never touches the FPU.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-23 18:29:53 +08:00