Commit Graph

24128 Commits

Author SHA1 Message Date
Yu Zhao
14f4760562 kvm: set page dirty only if page has been writable
In absence of shadow dirty mask, there is no need to set page dirty
if page has never been writable. This is a tiny optimization but
good to have for people who care much about dirty page tracking.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-01 12:10:10 +02:00
Paolo Bonzini
14ebda3394 KVM: x86: reduce default value of halt_poll_ns parameter
Windows lets applications choose the frequency of the timer tick,
and in Windows 10 the maximum rate was changed from 1024 Hz to
2048 Hz.  Unfortunately, because of the way the Windows API
works, most applications who need a higher rate than the default
64 Hz will just do

   timeGetDevCaps(&tc, sizeof(tc));
   timeBeginPeriod(tc.wPeriodMin);

and pick the maximum rate.  This causes very high CPU usage when
playing media or games on Windows 10, even if the guest does not
actually use the CPU very much, because the frequent timer tick
causes halt_poll_ns to kick in.

There is no really good solution, especially because Microsoft
could sooner or later bump the limit to 4096 Hz, but for now
the best we can do is lower a bit the upper limit for
halt_poll_ns. :-(

Reported-by: Jon Panozzo <jonp@lime-technology.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-01 12:10:10 +02:00
Paolo Bonzini
a2b5c3c0c8 KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled
If SynIC is disabled, there is nothing that userspace can do to
handle these exits; on the other hand, userspace probably will
not know about KVM_EXIT_HYPERV_HCALL and complain about it or
even exit.  Just prevent anything bad from happening by handling
the hypercall in KVM and returning an "invalid hypercall" code.

Fixes: 83326e43f2
Cc: Andrey Smetanin <irqlevel@gmail.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-01 12:10:09 +02:00
Yuki Shibuya
321c5658c5 KVM: x86: Inject pending interrupt even if pending nmi exist
Non maskable interrupts (NMI) are preferred to interrupts in current
implementation. If a NMI is pending and NMI is blocked by the result
of nmi_allowed(), pending interrupt is not injected and
enable_irq_window() is not executed, even if interrupts injection is
allowed.

In old kernel (e.g. 2.6.32), schedule() is often called in NMI context.
In this case, interrupts are needed to execute iret that intends end
of NMI. The flag of blocking new NMI is not cleared until the guest
execute the iret, and interrupts are blocked by pending NMI. Due to
this, iret can't be invoked in the guest, and the guest is starved
until block is cleared by some events (e.g. canceling injection).

This patch injects pending interrupts, when it's allowed, even if NMI
is blocked. And, If an interrupts is pending after executing
inject_pending_event(), enable_irq_window() is executed regardless of
NMI pending counter.

Cc: stable@vger.kernel.org
Signed-off-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-01 12:10:09 +02:00
Peter Zijlstra
85dc600263 perf/x86/amd/ibs: Fix pmu::stop() nesting
Patch 5a50f52917 ("perf/x86/ibs: Fix race with IBS_STARTING state")
closed a big hole while opening another, smaller hole.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 5a50f52917 ("perf/x86/ibs: Fix race with IBS_STARTING state")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 09:54:08 +02:00
Boris Ostrovsky
dc6416f1d7 xen/x86: Call cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()
This call has always been missing from xen_play dead() but until
recently this was rather benign. With new cpu hotplug framework
(commit 8df3e07e7f ("cpu/hotplug: Let upcoming cpu bring itself fully up").
however this call is required, otherwise a hot-plugged CPU will not
be properly brough up (by never calling cpuhp_online_idle())

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-03-29 09:34:10 -04:00
Konrad Rzeszutek Wilk
8041dcc881 Linux 4.6-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW9xVQAAoJEHm+PkMAQRiGLCYH/RaD6PE78ttLoAIm2rTUY+P9
 jtNhRVVNPmNn6103r0g9RYxyN0k+KyRQiO0I3XC77i30ksJV0duqhZfpg9nyHaFZ
 ljiYBLWFcc6qyMiE5B9ogEwveQriKtbUTrIPHer/vEmOdJHDRoUIVZ/qQ+eNzGmq
 aHa0bJalCXqkRpXatfZBUz8OyblrNjZMpX6907vqR5kRrcXqRn52X32gmy6bwSRp
 oRRQvSjzzZzApISz1uQIY5TFEb9ARrtK2FUf/rM0WS4gE1tB/p8Ccx7iopqk/zjI
 f+vZEdQ6o6FdPZ38XZGnwyr1nf52AV5/7wJs+5D1rLCHK8buEJD01BiT7qQvpnI=
 =q0+P
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc1' into for-linus-4.6

Linux 4.6-rc1

* tag 'v4.6-rc1': (12823 commits)
  Linux 4.6-rc1
  f2fs/crypto: fix xts_tweak initialization
  NTB: Remove _addr functions from ntb_hw_amd
  orangefs: fix orangefs_superblock locking
  orangefs: fix do_readv_writev() handling of error halfway through
  orangefs: have ->kill_sb() evict the VFS side of things first
  orangefs: sanitize ->llseek()
  orangefs-bufmap.h: trim unused junk
  orangefs: saner calling conventions for getting a slot
  orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
  orangefs: get rid of readdir_handle_s
  thp: fix typo in khugepaged_scan_pmd()
  MAINTAINERS: fill entries for KASAN
  mm/filemap: generic_file_read_iter(): check for zero reads unconditionally
  kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2
  mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
  arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
  mm, kasan: add GFP flags to KASAN API
  mm, kasan: SLAB support
  kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right()
  ...
2016-03-29 09:33:47 -04:00
H.J. Lu
6d92bc9d48 x86/build: Build compressed x86 kernels as PIE
The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
relocation to get the symbol address in PIC.  When the compressed x86
kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations
to their fixed symbol addresses.  However, when the compressed x86
kernel is loaded at a different address, it leads to the following
load failure:

  Failed to allocate space for phdrs

during the decompression stage.

If the compressed x86 kernel is relocatable at run-time, it should be
compiled with -fPIE, instead of -fPIC, if possible and should be built as
Position Independent Executable (PIE) so that linker won't optimize
R_386_GOT32X relocation to its fixed symbol address.

Older linkers generate R_386_32 relocations against locally defined
symbols, _bss, _ebss, _got and _egot, in PIE.  It isn't wrong, just less
optimal than R_386_RELATIVE.  But the x86 kernel fails to properly handle
R_386_32 relocations when relocating the kernel.  To generate
R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
hidden in both 32-bit and 64-bit x86 kernels.

To build a 64-bit compressed x86 kernel as PIE, we need to disable the
relocation overflow check to avoid relocation overflow errors. We do
this with a new linker command-line option, -z noreloc-overflow, which
got added recently:

 commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7
 Author: H.J. Lu <hjl.tools@gmail.com>
 Date:   Tue Mar 15 11:07:06 2016 -0700

    Add -z noreloc-overflow option to x86-64 ld

    Add -z noreloc-overflow command-line option to the x86-64 ELF linker to
    disable relocation overflow check.  This can be used to avoid relocation
    overflow check if there will be no dynamic relocation overflow at
    run-time.

The 64-bit compressed x86 kernel is built as PIE only if the linker supports
-z noreloc-overflow.  So far 64-bit relocatable compressed x86 kernel
boots fine even when it is built as a normal executable.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Edited the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:51:12 +02:00
Huang Rui
34a4cceb78 x86/cpu: Add advanced power management bits
Bit 11 of CPUID 8000_0007 edx is processor feedback interface.
Bit 12 of CPUID 8000_0007 edx is accumulated power.

Print proper names in proc/cpuinfo

Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Tony Li <tony.li@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "Len Brown" <lenb@kernel.org>
Link: http://lkml.kernel.org/r/1458871720-3209-1-git-send-email-ray.huang@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 11:12:11 +02:00
Borislav Petkov
5f870a3f71 x86/thread_info: Merge two !__ASSEMBLY__ sections
We have

  #ifndef __ASSEMBLY__
  ...
  #endif

  #ifndef __ASSEMBLY__
  ...
  #endif

Merge the two.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1459189217-25532-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 11:12:10 +02:00
Vladimir Zapolskiy
4a6772f514 x86/cpufreq: Remove duplicated TDP MSR macro definitions
The list of CPU model specific registers contains two copies of TDP
registers, remove the one, which is out of numerical order in the
list.

Fixes: 6a35fc2d6c ("cpufreq: intel_pstate: get P1 from TAR when available")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Kristen Carlson
 Accardi <kristen@linux.intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: http://lkml.kernel.org/r/1459018020-24577-1-git-send-email-vladimir_zapolskiy@mentor.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 11:12:10 +02:00
Borislav Petkov
8196dab4fc x86/cpu: Get rid of compute_unit_id
It is cpu_core_id anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1458917557-8757-3-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 10:45:04 +02:00
Peter Zijlstra
32b62f4468 perf/x86/amd: Cleanup Fam10h NB event constraints
Avoid allocating the AMD NB event constraints data structure when not
needed. This gets rid of x86_max_cores usage and avoids allocating
this on AMD Core Perfctr supporting hardware (which has separate MSRs
for NB events).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: aherrmann@suse.com
Cc: Rui Huang <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: jencce.kernel@gmail.com
Link: http://lkml.kernel.org/r/20160320124629.GY6375@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 10:45:04 +02:00
Peter Zijlstra
ee6825c80e x86/topology: Fix AMD core count
It turns out AMD gets x86_max_cores wrong when there are compute
units.

The issue is that Linux assumes:

	nr_logical_cpus = nr_cores * nr_siblings

But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.

Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core
properly, according to the new nomenclature.

Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Reported-by: Xiong Zhou <jencce.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <aherrmann@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 10:45:04 +02:00
Dan Williams
fc0c202813 x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem()
Update the definition of memcpy_from_pmem() to return 0 or a negative
error code.  Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().

Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-28 17:19:31 -07:00
Srinivas Pandruvada
a21211672c ACPI / processor: Request native thermal interrupt handling via _OSC
There are several reports of freeze on enabling HWP (Hardware PStates)
feature on Skylake-based systems by the Intel P-states driver. The root
cause is identified as the HWP interrupts causing BIOS code to freeze.

HWP interrupts use the thermal LVT which can be handled by Linux
natively, but on the affected Skylake-based systems SMM will respond
to it by default.  This is a problem for several reasons:
 - On the affected systems the SMM thermal LVT handler is broken (it
   will crash when invoked) and a BIOS update is necessary to fix it.
 - With thermal interrupt handled in SMM we lose all of the reporting
   features of the arch/x86/kernel/cpu/mcheck/therm_throt driver.
 - Some thermal drivers like x86-package-temp depend on the thermal
   threshold interrupts signaled via the thermal LVT.
 - The HWP interrupts are useful for debugging and tuning
   performance (if the kernel can handle them).
The native handling of thermal interrupts needs to be enabled
because of that.

This requires some way to tell SMM that the OS can handle thermal
interrupts.  That can be done by using _OSC/_PDC in processor
scope very early during ACPI initialization.

The meaning of _OSC/_PDC bit 12 in processor scope is whether or
not the OS supports native handling of interrupts for Collaborative
Processor Performance Control (CPPC) notifications.  Since on
HWP-capable systems CPPC is a firmware interface to HWP, setting
this bit effectively tells the firmware that the OS will handle
thermal interrupts natively going forward.

For details on _OSC/_PDC refer to:
http://www.intel.com/content/www/us/en/standards/processor-vendor-specific-acpi-specification.html

To implement the _OSC/_PDC handshake as described, introduce a new
function, acpi_early_processor_osc(), that walks the ACPI
namespace looking for ACPI processor objects and invokes _OSC for
them with bit 12 in the capabilities buffer set and terminates the
namespace walk on the first success.

Also modify intel_thermal_interrupt() to clear HWP status bits in
the HWP_STATUS MSR to acknowledge HWP interrupts (which prevents
them from firing continuously).

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject & changelog, function rename ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-26 02:00:38 +01:00
Alexander Potapenko
cd11016e5f mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks.  The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.

IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.

Right now stackdepot support is only enabled in SLAB allocator.  Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.

This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.

Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.

[akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabinin@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Alexander Potapenko
be7635e728 arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>.  Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Boris Ostrovsky
ed6069be72 xen/apic: Provide Xen-specific version of cpu_present_to_apicid APIC op
Currently Xen uses default_cpu_present_to_apicid() which will always
report BAD_APICID for PV guests since x86_bios_cpu_apic_id is initialised
to that value and is never updated.

With commit 1f12e32f4c ("x86/topology: Create logical package id"), this
op is now called by smp_init_package_map() when deciding whether to call
topology_update_package_map() which sets cpu_data(cpu).logical_proc_id.
The latter (as topology_logical_package_id(cpu)) may be used, for example,
by cpu_to_rapl_pmu() as an array index. Since uninitialized
logical_package_id is set to -1, the index will become 64K which is
obviously problematic.

While RAPL code (and any other users of logical_package_id) should be
careful in their assumptions about id's validity, Xen's
cpu_present_to_apicid op should still provide value consistent with its
own xen_apic_read(APIC_ID).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-03-25 11:42:53 -04:00
Huang Rui
a49ac9f83b perf/x86: Move events_sysfs_show() outside CPU_SUP_INTEL
randconfig builds can sometimes disable CONFIG_CPU_SUP_INTEL while
enabling the AMD power reporting PMU driver, resulting in this
build failure:

  arch/x86/kernel/cpu/perf_event.h:663:31: error: 'events_sysfs_show' undeclared here (not in a function)

To fix it, move events_sysfs_show() outside of #ifdef CONFIG_CPU_SUP_INTEL.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: build test robot <lkp@intel.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: kbuild-all@01.org
Cc: linux-next@vger.kernel.org
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/1458875905-4278-1-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-25 09:46:53 +01:00
Linus Torvalds
e46b4e2b46 Nothing major this round. Mostly small clean ups and fixes.
Some visible changes:
 
  A new flag was added to distinguish traces done in NMI context.
 
  Preempt tracer now shows functions where preemption is disabled but
  interrupts are still enabled.
 
 Other notes:
 
  Updates were done to function tracing to allow better performance
  with perf.
 
  Infrastructure code has been added to allow for a new histogram
  feature for recording live trace event histograms that can be
  configured by simple user commands. The feature itself was just
  finished, but needs a round in linux-next before being pulled.
  This only includes some infrastructure changes that will be needed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW8/WPAAoJEKKk/i67LK/8wrAH/j2gU9ZfjVxTu8068TBGWRJP
 yvvzq0cK5evB3dsVuUmKKRfU52nSv4J1WcFF569X0RulSLylR0dHlcxFJMn4kkgR
 bm0AHRrqOf87ub3VimcpG146iVQij37l5A0SRoFbvSPLQx1KUW18v99x41Ji8dv6
 oWXRc6/YhdzEE7l0nUsVjmScQ4b2emsems3cxZzXOY+nRJsiim6i+VaDeatdyey1
 csLVqtRCs+x62TVtxG3+GhcLdRoPRbnHAGzrKDFIn1SrQaRXCc54wN5d2hWxjgNI
 1laOwaj070lnJiWfBLIP/K+lx+VKRx5/O0rKZX35foLUTqJJKSyjAbKXuMCcSAM=
 =2h2K
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Nothing major this round.  Mostly small clean ups and fixes.

  Some visible changes:

   - A new flag was added to distinguish traces done in NMI context.

   - Preempt tracer now shows functions where preemption is disabled but
     interrupts are still enabled.

  Other notes:

   - Updates were done to function tracing to allow better performance
     with perf.

   - Infrastructure code has been added to allow for a new histogram
     feature for recording live trace event histograms that can be
     configured by simple user commands.  The feature itself was just
     finished, but needs a round in linux-next before being pulled.

     This only includes some infrastructure changes that will be needed"

* tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
  tracing: Record and show NMI state
  tracing: Fix trace_printk() to print when not using bprintk()
  tracing: Remove redundant reset per-CPU buff in irqsoff tracer
  x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
  tracing: Fix crash from reading trace_pipe with sendfile
  tracing: Have preempt(irqs)off trace preempt disabled functions
  tracing: Fix return while holding a lock in register_tracer()
  ftrace: Use kasprintf() in ftrace_profile_tracefs()
  ftrace: Update dynamic ftrace calls only if necessary
  ftrace: Make ftrace_hash_rec_enable return update bool
  tracing: Fix typoes in code comment and printk in trace_nop.c
  tracing, writeback: Replace cgroup path to cgroup ino
  tracing: Use flags instead of bool in trigger structure
  tracing: Add an unreg_all() callback to trigger commands
  tracing: Add needs_rec flag to event triggers
  tracing: Add a per-event-trigger 'paused' field
  tracing: Add get_syscall_name()
  tracing: Add event record param to trigger_ops.func()
  tracing: Make event trigger functions available
  tracing: Make ftrace_event_field checking functions available
  ...
2016-03-24 10:52:25 -07:00
Linus Torvalds
3fa2fe2ce0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree contains various perf fixes on the kernel side, plus three
  hw/event-enablement late additions:

   - Intel Memory Bandwidth Monitoring events and handling
   - the AMD Accumulated Power Mechanism reporting facility
   - more IOMMU events

  ... and a final round of perf tooling updates/fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  perf llvm: Use strerror_r instead of the thread unsafe strerror one
  perf llvm: Use realpath to canonicalize paths
  perf tools: Unexport some methods unused outside strbuf.c
  perf probe: No need to use formatting strbuf method
  perf help: Use asprintf instead of adhoc equivalents
  perf tools: Remove unused perf_pathdup, xstrdup functions
  perf tools: Do not include stringify.h from the kernel sources
  tools include: Copy linux/stringify.h from the kernel
  tools lib traceevent: Remove redundant CPU output
  perf tools: Remove needless 'extern' from function prototypes
  perf tools: Simplify die() mechanism
  perf tools: Remove unused DIE_IF macro
  perf script: Remove lots of unused arguments
  perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
  perf machine: Rename perf_event__preprocess_sample to machine__resolve
  perf tools: Add cpumode to struct perf_sample
  perf tests: Forward the perf_sample in the dwarf unwind test
  perf tools: Remove misplaced __maybe_unused
  perf list: Fix documentation of :ppp
  perf bench numa: Fix assertion for nodes bitfield
  ...
2016-03-24 10:02:14 -07:00
Linus Torvalds
d88f48e128 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - fix hotplug bugs
   - fix irq live lock
   - fix various topology handling bugs
   - fix APIC ACK ordering
   - fix PV iopl handling
   - fix speling
   - fix/tweak memcpy_mcsafe() return value
   - fix fbcon bug
   - remove stray prototypes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/msr: Remove unused native_read_tscp()
  x86/apic: Remove declaration of unused hw_nmi_is_cpu_stuck
  x86/oprofile/nmi: Add missing hotplug FROZEN handling
  x86/hpet: Use proper mask to modify hotplug action
  x86/apic/uv: Fix the hotplug notifier
  x86/apb/timer: Use proper mask to modify hotplug action
  x86/topology: Use total_cpus not nr_cpu_ids for logical packages
  x86/topology: Fix Intel HT disable
  x86/topology: Fix logical package mapping
  x86/irq: Cure live lock in fixup_irqs()
  x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known()
  x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
  x86/iopl: Fix iopl capability check on Xen PV
  x86/iopl/64: Properly context-switch IOPL on Xen PV
  selftests/x86: Add an iopl test
  x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
  x86/video: Don't assume all FB devices are PCI devices
  arch/x86/irq: Purge useless handler declarations from hw_irq.h
  x86: Fix misspellings in comments
2016-03-24 09:47:32 -07:00
Prarit Bhargava
9da77666d6 x86/msr: Remove unused native_read_tscp()
After e76b027 ("x86,vdso: Use LSL unconditionally for vgetcpu")
native_read_tscp() is unused in the kernel. The function can be removed like
native_read_tsc() was.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1458687968-9106-1-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-23 12:34:17 +01:00
Linus Torvalds
a24e3d414e Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:

 - more ocfs2 changes

 - a few hotfixes

 - Andy's compat cleanups

 - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
   panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

 - many rapidio updates: fixes, new drivers.

 - kcov: kernel code coverage feature.  Like gcov, but not
   "prohibitively expensive".

 - extable code consolidation for various archs

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
  ia64/extable: use generic search and sort routines
  x86/extable: use generic search and sort routines
  s390/extable: use generic search and sort routines
  alpha/extable: use generic search and sort routines
  kernel/...: convert pr_warning to pr_warn
  drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
  drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
  memremap: add MEMREMAP_WC flag
  memremap: don't modify flags
  kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
  mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
  ipc/sem: make semctl setting sempid consistent
  ubsan: fix tree-wide -Wmaybe-uninitialized false positives
  kfifo: fix sparse complaints
  scripts/gdb: account for changes in module data structure
  scripts/gdb: add cmdline reader command
  scripts/gdb: add version command
  kernel: add kcov code coverage
  profile: hide unused functions when !CONFIG_PROC_FS
  hpwdt: use nmi_panic() when kernel panics in NMI handler
  ...
2016-03-22 17:09:14 -07:00
Linus Torvalds
b91d9c6716 * Build fixes for PPC KVM
* Miscellaneous bugfixes for ARM KVM
 * Cleanup of memory barrier and removal of redundant barriers
 * x86 fixes: page tracking oops, support for old buggy KVM nested on 4.5
 * Support for protection keys in guests
 * Lockdep fix
 * Another conversion to simple wait queues and raw spinlocks,
   backported from PREEMPT_RT
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJW8aaXAAoJEL/70l94x66D7voH/i2ytj6PbuWQQobSKDY38x8F
 MHDFJ5UgTFZPPt8cB8YiCl6Tu0C5I2mNOk0rfb+bcpM5C1U9IAnBbbupyUblp6K9
 1u+u+al8IlnOsoLzJSUXKDK5H4mEVrUnwVxTpZol5Ph5qQ8FvpbkxboMu3AGevO5
 PIUXucK7fP5WXVV3Nh4YnUnBkeYzuuXcqYcV/TjscNQ4NcMofElcgpBxmE498TTk
 rvhyuf2chEtY2DsDh3nzeYgxcGLpvE4/l5a+puEoOx4M5CH24wwne9LHAWJz6ofm
 H3XNhsCz3jIGmrNqqkGUUya5qSkCsq2ha7n+VDw+fiP1TKy3FtkrBYQrDj+ISuc=
 =UtvF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "Second round of KVM changes for 4.6:

   - build fixes for PPC KVM
   - miscellaneous bugfixes for ARM KVM
   - cleanup of memory barrier and removal of redundant barriers
   - x86 fixes: page tracking oops, support for old buggy KVM nested on 4.5
   - support for protection keys in guests
   - lockdep fix
   - another conversion to simple wait queues and raw spinlocks,
     backported from PREEMPT_RT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
  KVM: page_track: fix access to NULL slot
  KVM: PPC: do not compile in vfio.o unconditionally
  kvm, rt: change async pagefault code locking for PREEMPT_RT
  KVM/PPC: update the comment of memory barrier in the kvmppc_prepare_to_enter()
  KVM/x86: update the comment of memory barrier in the vcpu_enter_guest()
  KVM: Replace smp_mb() with smp_load_acquire() in the kvm_flush_remote_tlbs()
  KVM/x86: Call smp_wmb() before increasing tlbs_dirty
  KVM: Replace smp_mb() with smp_mb_after_atomic() in the kvm_make_all_cpus_request()
  KVM/x86: Replace smp_mb() with smp_store_mb/release() in the walk_shadow_page_lockless_begin/end()
  KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()
  KVM, pkeys: expose CPUID/CR4 to guest
  KVM, pkeys: add pkeys support for permission_fault
  KVM, pkeys: introduce pkru_mask to cache conditions
  KVM, pkeys: save/restore PKRU when guest/host switches
  x86: pkey: introduce write_pkru() for KVM
  KVM, pkeys: add pkeys support for xsave state
  KVM, pkeys: disable pkeys for guests in non-paging mode
  KVM: x86: remove magic number with enum cpuid_leafs
  KVM: MMU: return page fault error code from permission_fault
  KVM: fix spin_lock_init order on x86
  ...
2016-03-22 16:28:22 -07:00
Ard Biesheuvel
29934b0fb8 x86/extable: use generic search and sort routines
Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Dmitry Vyukov
5c9a8750a6 kernel: add kcov code coverage
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing).  Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system.  A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/).  However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible.  It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g.  scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes.  Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch).  I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side.  The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

  https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation".  For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
typical coverage can be just a dozen of basic blocks (e.g.  an invalid
input).  In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M).  Cost of
kcov depends only on number of executed basic blocks/edges.  On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure.  But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Andy Lutomirski
f970165bee x86/compat: remove is_compat_task()
x86's is_compat_task always checked the current syscall type, not the
task type.  It has no non-arch users any more, so just remove it to
avoid confusion.

On x86, nothing should really be checking the task ABI.  There are
legitimate users for the syscall ABI and for the mm ABI.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Linus Torvalds
55fc733c7e xen: features and fixes for 4.6-rc0
- Make earlyprintk=xen work for HVM guests.
 - Remove module support for things never built as modules.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW8YhlAAoJEFxbo/MsZsTRiCgH/3GnL93s/BsRTrA/DYykYqA4
 rPPkDd6WMlEbrko6FY7o4IxUAZ50LWU8wjmDNVTT+R/VKlWCVh2+ksxjbH8e0nzr
 0QohAYTdG1VGmhkrwTjKj0wC3Nr8vaDqoBXAFRz4DiQdbobn12wzXjaVrbl8RRNc
 oHDqR+DfZj6ontqx8oFkkTk7CFKIGEOtm/uBhCAV2fNkQSUCzuQyKLlarxM6jq/2
 EnLR1bhnyoy5zNFzXxmaJgZOit8QFMKvPdDRknlp9yDHYJ5zolJkv+6FPiAG6SZs
 A8yAUPQoAD+ekAiV2DIhb8qoNNNdZXdRCFBpoudwX3GyyU4O2P5Intl0xEg17Nk=
 =L8S4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Features and fixes for 4.6:

  - Make earlyprintk=xen work for HVM guests

  - Remove module support for things never built as modules"

* tag 'for-linus-4.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  drivers/xen: make platform-pci.c explicitly non-modular
  drivers/xen: make sys-hypervisor.c explicitly non-modular
  drivers/xen: make xenbus_dev_[front/back]end explicitly non-modular
  drivers/xen: make [xen-]ballon explicitly non-modular
  xen: audit usages of module.h ; remove unnecessary instances
  xen/x86: Drop mode-selecting ifdefs in startup_xen()
  xen/x86: Zero out .bss for PV guests
  hvc_xen: make early_printk work with HVM guests
  hvc_xen: fix xenboot for DomUs
  hvc_xen: add earlycon support
2016-03-22 12:55:17 -07:00
Paolo Bonzini
a6adb10622 KVM: page_track: fix access to NULL slot
This happens when doing the reboot test from virt-tests:

[  131.833653] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  131.842461] IP: [<ffffffffa0950087>] kvm_page_track_is_active+0x17/0x60 [kvm]
[  131.850500] PGD 0
[  131.852763] Oops: 0000 [#1] SMP
[  132.007188] task: ffff880075fbc500 ti: ffff880850a3c000 task.ti: ffff880850a3c000
[  132.138891] Call Trace:
[  132.141639]  [<ffffffffa092bd11>] page_fault_handle_page_track+0x31/0x40 [kvm]
[  132.149732]  [<ffffffffa093380f>] paging64_page_fault+0xff/0x910 [kvm]
[  132.172159]  [<ffffffffa092c734>] kvm_mmu_page_fault+0x64/0x110 [kvm]
[  132.179372]  [<ffffffffa06743c2>] handle_exception+0x1b2/0x430 [kvm_intel]
[  132.187072]  [<ffffffffa067a301>] vmx_handle_exit+0x1e1/0xc50 [kvm_intel]
...

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: 3d0c27ad6e
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 17:27:28 +01:00
Rik van Riel
9db284f303 kvm, rt: change async pagefault code locking for PREEMPT_RT
The async pagefault wake code can run from the idle task in exception
context, so everything here needs to be made non-preemptible.

Conversion to a simple wait queue and raw spinlock does the trick.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:38 +01:00
Lan Tianyu
0f127d12e4 KVM/x86: update the comment of memory barrier in the vcpu_enter_guest()
The barrier also orders the write to mode from any reads
to the page tables done and so update the comment.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:35 +01:00
Lan Tianyu
7bfdf21778 KVM/x86: Call smp_wmb() before increasing tlbs_dirty
Update spte before increasing tlbs_dirty to make sure no tlb flush
in lost after spte is zapped. This pairs with the barrier in the
kvm_flush_remote_tlbs().

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:32 +01:00
Lan Tianyu
36ca7e0a57 KVM/x86: Replace smp_mb() with smp_store_mb/release() in the walk_shadow_page_lockless_begin/end()
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:29 +01:00
Lan Tianyu
9753f52915 KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()
There is already a barrier inside of kvm_flush_remote_tlbs() which can
help to make sure everyone sees our modifications to the page tables and
see changes to vcpu->mode here. So remove the smp_mb in the
kvm_mmu_commit_zap_page() and update the comment.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:27 +01:00
Huaitong Han
b9baba8614 KVM, pkeys: expose CPUID/CR4 to guest
X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:
CPUID.7.0.ECX[3]:PKU. X86_FEATURE_OSPKE is software support for pkeys,
enumerated with CPUID.7.0.ECX[4]:OSPKE, and it reflects the setting of
CR4.PKE(bit 22).

This patch disables CPUID:PKU without ept, because pkeys is not yet
implemented for shadow paging.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:17 +01:00
Huaitong Han
be94f6b710 KVM, pkeys: add pkeys support for permission_fault
Protection keys define a new 4-bit protection key field (PKEY) in bits
62:59 of leaf entries of the page tables, the PKEY is an index to PKRU
register(16 domains), every domain has 2 bits(write disable bit, access
disable bit).

Static logic has been produced in update_pkru_bitmask, dynamic logic need
read pkey from page table entries, get pkru value, and deduce the correct
result.

[ Huaitong: Xiao helps to modify many sections. ]

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:23:37 +01:00
Huaitong Han
2d344105f5 KVM, pkeys: introduce pkru_mask to cache conditions
PKEYS defines a new status bit in the PFEC. PFEC.PK (bit 5), if some
conditions is true, the fault is considered as a PKU violation.
pkru_mask indicates if we need to check PKRU.ADi and PKRU.WDi, and
does cache some conditions for permission_fault.

[ Huaitong: Xiao helps to modify many sections. ]

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:06 +01:00
Xiao Guangrong
1be0e61c1f KVM, pkeys: save/restore PKRU when guest/host switches
Currently XSAVE state of host is not restored after VM-exit and PKRU
is managed by XSAVE so the PKRU from guest is still controlling the
memory access even if the CPU is running the code of host. This is
not safe as KVM needs to access the memory of userspace (e,g QEMU) to
do some emulation.

So we save/restore PKRU when guest/host switches.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:06 +01:00
Xiao Guangrong
9e90199c25 x86: pkey: introduce write_pkru() for KVM
KVM will use it to switch pkru between guest and host.

CC: Ingo Molnar <mingo@redhat.com>
CC: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:05 +01:00
Huaitong Han
17a511f878 KVM, pkeys: add pkeys support for xsave state
This patch adds pkeys support for xsave state.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:05 +01:00
Huaitong Han
ddba262891 KVM, pkeys: disable pkeys for guests in non-paging mode
Pkeys is disabled if CPU is in non-paging mode in hardware. However KVM
always uses paging mode to emulate guest non-paging, mode with TDP. To
emulate this behavior, pkeys needs to be manually disabled when guest
switches to non-paging mode.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:04 +01:00
Huaitong Han
e0b18ef718 KVM: x86: remove magic number with enum cpuid_leafs
This patch removes magic number with enum cpuid_leafs.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:04 +01:00
Paolo Bonzini
f13577e8aa KVM: MMU: return page fault error code from permission_fault
This will help in the implementation of PKRU, where the PK bit of the page
fault error code cannot be computed in advance (unlike I/D, R/W and U/S).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:20:54 +01:00
Paolo Bonzini
ef697a712a KVM: VMX: fix nested vpid for old KVM guests
Old KVM guests invoke single-context invvpid without actually checking
whether it is supported.  This was fixed by commit 518c8ae ("KVM: VMX:
Make sure single type invvpid is supported before issuing invvpid
instruction", 2010-08-01) and the patch after, but pre-2.6.36
kernels lack it including RHEL 6.

Reported-by: jmontleo@redhat.com
Tested-by: jmontleo@redhat.com
Cc: stable@vger.kernel.org
Fixes: 99b83ac893
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 12:02:46 +01:00
Paolo Bonzini
f6870ee9e5 KVM: VMX: avoid guest hang on invalid invvpid instruction
A guest executing an invalid invvpid instruction would hang
because the instruction pointer was not updated.

Reported-by: jmontleo@redhat.com
Tested-by: jmontleo@redhat.com
Cc: stable@vger.kernel.org
Fixes: 99b83ac893
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 12:02:42 +01:00
Paolo Bonzini
2849eb4f99 KVM: VMX: avoid guest hang on invalid invept instruction
A guest executing an invalid invept instruction would hang
because the instruction pointer was not updated.

Cc: stable@vger.kernel.org
Fixes: bfd0a56b90
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 12:02:38 +01:00
Srinivas Pandruvada
7b0fd56930 perf/x86/intel/rapl: Add missing Broadwell models
Added Broadwell-H and Broadwell-Server.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/1458517938-25308-1-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 11:16:19 +01:00
Kan Liang
cb2252522a perf/x86/intel/uncore: Remove ev_sel_ext bit support for PCU
The ev_sel_ext in PCU_MSR_PMON_CTL is locked on some CPU models, so despite
it being documented in the SDM, if we write 1 to that bit then we can get a #GP
fault.

Which #GP the perf fuzzer happily triggered in Peter Zijlstra's testing.

Also, there are no public events which use that bit, so remove ev_sel_ext
bit support for PCU.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1458500301-3594-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 11:16:19 +01:00
Huang Rui
c7ab62bfbe perf/x86/amd/power: Add AMD accumulated power reporting mechanism
Introduce an AMD accumlated power reporting mechanism for the Family
15h, Model 60h processor that can be used to calculate the average
power consumed by a processor during a measurement interval. The
feature support is indicated by CPUID Fn8000_0007_EDX[12].

This feature will be implemented both in hwmon and perf. The current
design provides one event to report per package/processor power
consumption by counting each compute unit power value.

Here the gory details of how the computation is done:

* Tsample: compute unit power accumulator sample period
* Tref: the PTSC counter period (PTSC: performance timestamp counter)
* N: the ratio of compute unit power accumulator sample period to the
  PTSC period

* Jmax: max compute unit accumulated power which is indicated by
  MSR_C001007b[MaxCpuSwPwrAcc]

* Jx/Jy: compute unit accumulated power which is indicated by
  MSR_C001007a[CpuSwPwrAcc]

* Tx/Ty: the value of performance timestamp counter which is indicated
  by CU_PTSC MSR_C0010280[PTSC]
* PwrCPUave: CPU average power

i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
	N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].

ii. Read the full range of the cumulative energy value from the new
    MSR MaxCpuSwPwrAcc.
	Jmax = value returned.

iii. At time x, software reads CpuSwPwrAcc and samples the PTSC.
	Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC.

iv. At time y, software reads CpuSwPwrAcc and samples the PTSC.
	Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC.

v. Calculate the average power consumption for a compute unit over
time period (y-x). Unit of result is uWatt:

	if (Jy < Jx) // Rollover has occurred
		Jdelta = (Jy + Jmax) - Jx
	else
		Jdelta = Jy - Jx
	PwrCPUave = N * Jdelta * 1000 / (Ty - Tx)

Simple example:

  root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4
    CHK     include/config/kernel.release
    CHK     include/generated/uapi/linux/version.h
    CHK     include/generated/utsrelease.h
    CHK     include/generated/timeconst.h
    CHK     include/generated/bounds.h
    CHK     include/generated/asm-offsets.h
    CALL    scripts/checksyscalls.sh
    CHK     include/generated/compile.h
    SKIPPED include/generated/compile.h
    Building modules, stage 2.
  Kernel: arch/x86/boot/bzImage is ready  (#40)
    MODPOST 4225 modules

   Performance counter stats for 'system wide':

              183.44 mWatts power/power-pkg/

       341.837270111 seconds time elapsed

  root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10

   Performance counter stats for 'system wide':

                0.18 mWatts power/power-pkg/

        10.012551815 seconds time elapsed

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jacob.w.shin@gmail.com
Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com
[ Fixed the modular build. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:37:15 +01:00
Huang Rui
01fe03ff1c x86/cpufeature, perf/x86: Add AMD Accumulated Power Mechanism feature flag
AMD CPU family 15h model 0x60 introduces a mechanism for measuring
accumulated power. It is used to report the processor power consumption
and support for it is indicated by CPUID Fn8000_0007_EDX[12].

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wan Zongshun <Vincent.Wan@amd.com>
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com
[ Resolved conflict and moved the synthetic CPUID slot to 19. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:35:29 +01:00
Suravee Suthikulpanit
f8519155b4 perf/x86/amd: Add support for new IOMMU performance events
This patch adds new IOMMU performance event based on
the information in table 74 of the AMD I/O Virtualization Technology
(IOMMU) Specification (Document Id: 4882, Rev 2.62, Feb 2015)

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Cc: <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://support.amd.com/TechDocs/48882_IOMMU.pdf
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:35:28 +01:00
Huang Rui
8dfeae0d73 perf/x86/amd: Move nodes_per_socket into bsp_init_amd()
nodes_per_socket is static and it needn't be initialized many
times during every CPU core init. So move its initialization into
bsp_init_amd().

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/1452739808-11871-2-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:22 +01:00
Peter Zijlstra
27348f382b perf/x86/cqm: Factor out some common code
Having the same code twice (and once quite ugly) is fragile.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:22 +01:00
Vikas Shivappa
e7ee3e8cb5 perf/x86/mbm: Add support for MBM counter overflow handling
This patch adds a per package timer which periodically updates the
memory bandwidth counters for the events that are currently active.

Current patch has a periodic timer every 1s since the SDM guarantees
that the counter will not overflow in 1s but this time can be definitely
improved by calibrating on the system. The overflow is really a function
of the max memory b/w that the socket can support, max counter value and
scaling factor.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/013b756c5006b1c4ca411f3ecf43ed52f19fbf87.1457723885.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:21 +01:00
Vikas Shivappa
2d4de8376f perf/x86/mbm: Implement RMID recycling
RMID could be allocated or deallocated as part of RMID recycling.

When an RMID is allocated for MBM event, the MBM counter needs to be
initialized because next time we read the counter we need the previous
value to account for total bytes that went to the memory controller.

Similarly, when RMID is deallocated we need to update the ->count
variable.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-6-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:20 +01:00
Tony Luck
87f01cc2a2 perf/x86/mbm: Add memory bandwidth monitoring event management
Includes all the core infrastructure to measure the total_bytes and
bandwidth.

We have per socket counters for both total system wide L3 external
bytes and local socket memory-controller bytes. The OS does MSR writes
to MSR_IA32_QM_EVTSEL and MSR_IA32_QM_CTR to read the counters and
uses the IA32_PQR_ASSOC_MSR to associate the RMID with the task. The
tasks have a common RMID for CQM (cache quality of service monitoring)
and MBM. Hence most of the scheduling code is reused from CQM.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Restructured rmid_read to not have an obvious hole, removed MBM_CNTR_MAX as its unused. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/abd7aac9a18d93b95b985b931cf258df0164746d.1457723885.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:20 +01:00
Vikas Shivappa
33c3cc7acf perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init
The MBM init patch enumerates the Intel MBM (Memory b/w monitoring)
and initializes the perf events and datastructures for monitoring the
memory b/w.

Its based on original patch series by Tony Luck and Kanaka Juvva.

Memory bandwidth monitoring (MBM) provides OS/VMM a way to monitor
bandwidth from one level of cache to another. The current patches
support L3 external bandwidth monitoring. It supports both 'local
bandwidth' and 'total bandwidth' monitoring for the socket. Local
bandwidth measures the amount of data sent through the memory controller
on the socket and total b/w measures the total system bandwidth.

Extending the cache quality of service monitoring (CQM) we add two
more events to the perf infrastructure:

  intel_cqm_llc/local_bytes - bytes sent through local socket memory controller
  intel_cqm_llc/total_bytes - total L3 external bytes sent

The tasks are associated with a Resouce Monitoring ID (RMID) just like
in CQM and OS uses a MSR write to indicate the RMID of the task during
scheduling.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:19 +01:00
Vikas Shivappa
ada2f634cd perf/x86/cqm: Fix CQM memory leak and notifier leak
Fixes the hotcpu notifier leak and other global variable memory leaks
during CQM (cache quality of service monitoring) initialization.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-3-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:19 +01:00
Vikas Shivappa
a223c1c7ab perf/x86/cqm: Fix CQM handling of grouping events into a cache_group
Currently CQM (cache quality of service monitoring) is grouping all
events belonging to same PID to use one RMID. However its not counting
all of these different events. Hence we end up with a count of zero
for all events other than the group leader.

The patch tries to address the issue by keeping a flag in the
perf_event.hw which has other CQM related fields. The field is updated
at event creation and during grouping.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
[peterz: Changed hw_perf_event::is_group_event to an int]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-2-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:18 +01:00
Peter Zijlstra
e8d8a90fc5 perf/x86/BTS: Fix RCU usage
This splat reminds us:

[ 8166.045595] [ INFO: suspicious RCU usage. ]

[ 8166.168972]  [<ffffffff81127837>] lockdep_rcu_suspicious+0xe7/0x120
[ 8166.175966]  [<ffffffff811e0bae>] perf_callchain+0x23e/0x250
[ 8166.182280]  [<ffffffff811dda3d>] perf_prepare_sample+0x27d/0x350
[ 8166.189082]  [<ffffffff8100f503>] intel_pmu_drain_bts_buffer+0x133/0x200

... that as the core code does, one should hold rcu_read_lock() over that
entire BTS event-output generation sequence as well.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:17 +01:00
Peter Zijlstra
c2872d381f perf/x86/ibs: Add IBS interrupt to the dynamic throttle
Interrupt throttling is normally only done against
sysctl_perf_event_sample_rate. This means that if that number is too
high (for whatever reason) you can lock up your machine.

We have, however, a dynamic throttling scheme too, but for that to
work, we need to add a callback to the interrupt handler, IBS did not
have this, so add it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:16 +01:00
Peter Zijlstra
5a50f52917 perf/x86/ibs: Fix race with IBS_STARTING state
While tracing the IBS bits I saw the NMI hitting between clearing
IBS_STARTING and the actual MSR writes to disable the counter.

Since IBS_STARTING was cleared, the handler assumed these were spurious
NMIs and because STOPPING wasn't set yet either, insta-triggered an
"Unknown NMI".

Cure this by clearing IBS_STARTING after disabling the hardware.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:15 +01:00
Peter Zijlstra
0158b83f75 perf/x86/ibs: Fix IBS throttle
When the IBS IRQ handler get a !0 return from perf_event_overflow;
meaning it should throttle the event, it only disables it, it doesn't
call perf_ibs_stop().

This confuses the state machine, as we'll use pmu::start() ->
perf_ibs_start() to unthrottle.

Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dvyukov@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/20160311142346.GE6344@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:15 +01:00
Linus Torvalds
643ad15d47 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
2016-03-20 19:08:56 -07:00
Linus Torvalds
24b5e20f11 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes are:

   - Use separate EFI page tables when executing EFI firmware code.
     This isolates the EFI context from the rest of the kernel, which
     has security and general robustness advantages.  (Matt Fleming)

   - Run regular UEFI firmware with interrupts enabled.  This is already
     the status quo under other OSs.  (Ard Biesheuvel)

   - Various x86 EFI enhancements, such as the use of non-executable
     attributes for EFI memory mappings.  (Sai Praneeth Prakhya)

   - Various arm64 UEFI enhancements.  (Ard Biesheuvel)

   - ... various fixes and cleanups.

  The separate EFI page tables feature got delayed twice already,
  because it's an intrusive change and we didn't feel confident about
  it - third time's the charm we hope!"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  x86/mm/pat: Fix boot crash when 1GB pages are not supported by the CPU
  x86/efi: Only map kernel text for EFI mixed mode
  x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables
  x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd()
  efi/arm*: Perform hardware compatibility check
  efi/arm64: Check for h/w support before booting a >4 KB granular kernel
  efi/arm: Check for LPAE support before booting a LPAE kernel
  efi/arm-init: Use read-only early mappings
  efi/efistub: Prevent __init annotations from being used
  arm64/vmlinux.lds.S: Handle .init.rodata.xxx and .init.bss sections
  efi/arm64: Drop __init annotation from handle_kernel_image()
  x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings
  efi/runtime-wrappers: Run UEFI Runtime Services with interrupts enabled
  efi: Reformat GUID tables to follow the format in UEFI spec
  efi: Add Persistent Memory type name
  efi: Add NV memory attribute
  x86/efi: Show actual ending addresses in efi_print_memmap
  x86/efi/bgrt: Don't ignore the BGRT if the 'valid' bit is 0
  efivars: Use to_efivar_entry
  efi: Runtime-wrapper: Get rid of the rtc_lock spinlock
  ...
2016-03-20 18:58:18 -07:00
Linus Torvalds
26660a4046 Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull 'objtool' stack frame validation from Ingo Molnar:
 "This tree adds a new kernel build-time object file validation feature
  (ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation.
  It was written by and is maintained by Josh Poimboeuf.

  The motivation: there's a category of hard to find kernel bugs, most
  of them in assembly code (but also occasionally in C code), that
  degrades the quality of kernel stack dumps/backtraces.  These bugs are
  hard to detect at the source code level.  Such bugs result in
  incorrect/incomplete backtraces most of time - but can also in some
  rare cases result in crashes or other undefined behavior.

  The build time correctness checking is done via the new 'objtool'
  user-space utility that was written for this purpose and which is
  hosted in the kernel repository in tools/objtool/.  The tool's (very
  simple) UI and source code design is shaped after Git and perf and
  shares quite a bit of infrastructure with tools/perf (which tooling
  infrastructure sharing effort got merged via perf and is already
  upstream).  Objtool follows the well-known kernel coding style.

  Objtool does not try to check .c or .S files, it instead analyzes the
  resulting .o generated machine code from first principles: it decodes
  the instruction stream and interprets it.  (Right now objtool supports
  the x86-64 architecture.)

  From tools/objtool/Documentation/stack-validation.txt:

   "The kernel CONFIG_STACK_VALIDATION option enables a host tool named
    objtool which runs at compile time.  It has a "check" subcommand
    which analyzes every .o file and ensures the validity of its stack
    metadata.  It enforces a set of rules on asm code and C inline
    assembly code so that stack traces can be reliable.

    Currently it only checks frame pointer usage, but there are plans to
    add CFI validation for C files and CFI generation for asm files.

    For each function, it recursively follows all possible code paths
    and validates the correct frame pointer state at each instruction.

    It also follows code paths involving special sections, like
    .altinstructions, __jump_table, and __ex_table, which can add
    alternative execution paths to a given instruction (or set of
    instructions).  Similarly, it knows how to follow switch statements,
    for which gcc sometimes uses jump tables."

  When this new kernel option is enabled (it's disabled by default), the
  tool, if it finds any suspicious assembly code pattern, outputs
  warnings in compiler warning format:

    warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
    warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
    warning: objtool:__schedule()+0x3c0: duplicate frame pointer save
    warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer

  ... so that scripts that pick up compiler warnings will notice them.
  All known warnings triggered by the tool are fixed by the tree, most
  of the commits in fact prepare the kernel to be warning-free.  Most of
  them are bugfixes or cleanups that stand on their own, but there are
  also some annotations of 'special' stack frames for justified cases
  such entries to JIT-ed code (BPF) or really special boot time code.

  There are two other long-term motivations behind this tool as well:

   - To improve the quality and reliability of kernel stack frames, so
     that they can be used for optimized live patching.

   - To create independent infrastructure to check the correctness of
     CFI stack frames at build time.  CFI debuginfo is notoriously
     unreliable and we cannot use it in the kernel as-is without extra
     checking done both on the kernel side and on the build side.

  The quality of kernel stack frames matters to debuggability as well,
  so IMO we can merge this without having to consider the live patching
  or CFI debuginfo angle"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  objtool: Only print one warning per function
  objtool: Add several performance improvements
  tools: Copy hashtable.h into tools directory
  objtool: Fix false positive warnings for functions with multiple switch statements
  objtool: Rename some variables and functions
  objtool: Remove superflous INIT_LIST_HEAD
  objtool: Add helper macros for traversing instructions
  objtool: Fix false positive warnings related to sibling calls
  objtool: Compile with debugging symbols
  objtool: Detect infinite recursion
  objtool: Prevent infinite recursion in noreturn detection
  objtool: Detect and warn if libelf is missing and don't break the build
  tools: Support relative directory path for 'O='
  objtool: Support CROSS_COMPILE
  x86/asm/decoder: Use explicitly signed chars
  objtool: Enable stack metadata validation on 64-bit x86
  objtool: Add CONFIG_STACK_VALIDATION option
  objtool: Add tool to perform compile-time stack metadata validation
  x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
  sched: Always inline context_switch()
  ...
2016-03-20 18:23:21 -07:00
Ard Biesheuvel
142b9e6c9d x86/kallsyms: fix GOLD link failure with new relative kallsyms table format
Commit 2213e9a66b ("kallsyms: add support for relative offsets in
kallsyms address table") changed the default kallsyms symbol table
format to use relative references rather than absolute addresses.

This reduces the size of the kallsyms symbol table by 50% on 64-bit
architectures, and further reduces the size of the relocation tables
used by relocatable kernels.  Since the memory footprint of the static
kernel image is always much smaller than 4 GB, these relative references
are assumed to be representable in 32 bits, even when the native word
size is 64 bits.

On 64-bit architectures, this obviously only works if the distance
between each relative reference and the chosen anchor point is
representable in 32 bits, and so the table generation code in
scripts/kallsyms.c scans the table for the lowest value that is covered
by the kernel text, and selects it as the anchor point.

However, when using the GOLD linker rather than the default BFD linker
to build the x86_64 kernel, the symbol phys_offset_64, which is the
result of arithmetic defined in the linker script, is emitted as a 'T'
rather than an 'A' type symbol, resulting in scripts/kallsyms.c to
mistake it for a suitable anchor point, even though it is far away from
the actual kernel image in the virtual address space.  This results in
out-of-range warnings from scripts/kallsyms.c and a broken build.

So let's align with the BFD linker, and emit the phys_offset_[32|64]
symbols as absolute symbols explicitly.  Note that the out of range
issue does not exist on 32-bit x86, but this patch changes both symbols
for symmetry.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-20 13:52:37 -07:00
Linus Torvalds
3c2de27d79 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - Preparations of parallel lookups (the remaining main obstacle is the
   need to move security_d_instantiate(); once that becomes safe, the
   rest will be a matter of rather short series local to fs/*.c

 - preadv2/pwritev2 series from Christoph

 - assorted fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits)
  splice: handle zero nr_pages in splice_to_pipe()
  vfs: show_vfsstat: do not ignore errors from show_devname method
  dcache.c: new helper: __d_add()
  don't bother with __d_instantiate(dentry, NULL)
  untangle fsnotify_d_instantiate() a bit
  uninline d_add()
  replace d_add_unique() with saner primitive
  quota: use lookup_one_len_unlocked()
  cifs_get_root(): use lookup_one_len_unlocked()
  nfs_lookup: don't bother with d_instantiate(dentry, NULL)
  kill dentry_unhash()
  ceph_fill_trace(): don't bother with d_instantiate(dn, NULL)
  autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup()
  configfs: move d_rehash() into configfs_create() for regular files
  ceph: don't bother with d_rehash() in splice_dentry()
  namei: teach lookup_slow() to skip revalidate
  namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()
  lookup_one_len_unlocked(): use lookup_dcache()
  namei: simplify invalidation logics in lookup_dcache()
  namei: change calling conventions for lookup_{fast,slow} and follow_managed()
  ...
2016-03-19 18:52:29 -07:00
Linus Torvalds
1200b6809d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support more Realtek wireless chips, from Jes Sorenson.

   2) New BPF types for per-cpu hash and arrap maps, from Alexei
      Starovoitov.

   3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

   4) Allow the use of SO_REUSEPORT in order to do per-thread processing
   of incoming TCP/UDP connections.  The muxing can be done using a
   BPF program which hashes the incoming packet.  From Craig Gallek.

   5) Add a multiplexer for TCP streams, to provide a messaged based
      interface.  BPF programs can be used to determine the message
      boundaries.  From Tom Herbert.

   6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

   7) Avoid factorial complexity when taking down an inetdev interface
      with lots of configured addresses.  We were doing things like
      traversing the entire address less for each address removed, and
      flushing the entire netfilter conntrack table for every address as
      well.

   8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

   9) Allow offloading u32 classifiers to hardware, and implement for
      ixgbe, from John Fastabend.

  10) Allow configuring IRQ coalescing parameters on a per-queue basis,
      from Kan Liang.

  11) Extend ethtool so that larger link mode masks can be supported.
      From David Decotigny.

  12) Introduce devlink, which can be used to configure port link types
      (ethernet vs Infiniband, etc.), port splitting, and switch device
      level attributes as a whole.  From Jiri Pirko.

  13) Hardware offload support for flower classifiers, from Amir Vadai.

  14) Add "Local Checksum Offload".  Basically, for a tunneled packet
      the checksum of the outer header is 'constant' (because with the
      checksum field filled into the inner protocol header, the payload
      of the outer frame checksums to 'zero'), and we can take advantage
      of that in various ways.  From Edward Cree"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
  bonding: fix bond_get_stats()
  net: bcmgenet: fix dma api length mismatch
  net/mlx4_core: Fix backward compatibility on VFs
  phy: mdio-thunder: Fix some Kconfig typos
  lan78xx: add ndo_get_stats64
  lan78xx: handle statistics counter rollover
  RDS: TCP: Remove unused constant
  RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
  net: smc911x: convert pxa dma to dmaengine
  team: remove duplicate set of flag IFF_MULTICAST
  bonding: remove duplicate set of flag IFF_MULTICAST
  net: fix a comment typo
  ethernet: micrel: fix some error codes
  ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
  bpf, dst: add and use dst_tclassid helper
  bpf: make skb->tc_classid also readable
  net: mvneta: bm: clarify dependencies
  cls_bpf: reset class and reuse major in da
  ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
  ldmvsw: Add ldmvsw.c driver code
  ...
2016-03-19 10:05:34 -07:00
Thomas Gleixner
57d335ce88 x86/oprofile/nmi: Add missing hotplug FROZEN handling
We really do not want to keep that nmi enabled across suspend/resume.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-19 13:40:08 +01:00
Thomas Gleixner
f80be5e3d5 x86/hpet: Use proper mask to modify hotplug action
Magic hex constants are a guarantee for wreckage when the defines change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-19 13:40:08 +01:00
Thomas Gleixner
f47ab81aca x86/apic/uv: Fix the hotplug notifier
The notifier is missing the CPU_DOWN_FAILED transition. That leaves the
heartbeat disabled when CPU_DOWN_PREPARE fails.

It also does not handle the FROZEN transition variants. That might not be an
issue for UV, but it's inconsistent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
2016-03-19 13:40:08 +01:00
Thomas Gleixner
a38f98735e x86/apb/timer: Use proper mask to modify hotplug action
Magic hex constants are a guarantee for wreckage when the defines change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-19 13:40:08 +01:00
Thomas Gleixner
3e8db2246b x86/topology: Use total_cpus not nr_cpu_ids for logical packages
nr_cpu_ids can be limited on the command line via nr_cpus=. That can break the
logical package management because it results in a smaller number of packages,
but the cpus to online are occupying the full package space as the hyper
threads are enumerated after the physical cores typically.

total_cpus is the real possible cpu space not limited by nr_cpus command line
and gives us the proper number of packages.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Xiong Zhou <jencce.kernel@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andreas Herrmann <aherrmann@suse.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603181254330.3978@nanos
2016-03-19 10:26:40 +01:00
Peter Zijlstra
63d1e995be x86/topology: Fix Intel HT disable
As per the comment in the code; due to BIOS it is sometimes impossible to know
if there actually are smp siblings until the machine is fully enumerated. So
we rather overestimate the number of possible packages.

Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: aherrmann@suse.com
Cc: jencce.kernel@gmail.com
Cc: bp@alien8.de
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Link: http://lkml.kernel.org/r/20160318150538.611014173@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-19 10:26:40 +01:00
Peter Zijlstra
b5d5f27d93 x86/topology: Fix logical package mapping
That first branch testing pkg against __max_logical_packages is wrong,
because if the first pkg id is larger, then the find_first_zero will
find us logical package id 0. However, if the second pkg id is indeed
0, we'll again claim it without testing if it was already taken.

Also, it fails to print the mapping.

Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Reported-by: Xiong Zhou <jencce.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: aherrmann@suse.com
Cc: bp@alien8.de
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net
Link: http://lkml.kernel.org/r/20160318150538.482393396@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-19 10:26:40 +01:00
Linus Torvalds
814a2bf957 Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - a couple of hotfixes

 - the rest of MM

 - a new timer slack control in procfs

 - a couple of procfs fixes

 - a few misc things

 - some printk tweaks

 - lib/ updates, notably to radix-tree.

 - add my and Nick Piggin's old userspace radix-tree test harness to
   tools/testing/radix-tree/.  Matthew said it was a godsend during the
   radix-tree work he did.

 - a few code-size improvements, switching to __always_inline where gcc
   screwed up.

 - partially implement character sets in sscanf

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  sscanf: implement basic character sets
  lib/bug.c: use common WARN helper
  param: convert some "on"/"off" users to strtobool
  lib: add "on"/"off" support to kstrtobool
  lib: update single-char callers of strtobool()
  lib: move strtobool() to kstrtobool()
  include/linux/unaligned: force inlining of byteswap operations
  include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
  include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
  usb: common: convert to use match_string() helper
  ide: hpt366: convert to use match_string() helper
  ata: hpt366: convert to use match_string() helper
  power: ab8500: convert to use match_string() helper
  power: charger_manager: convert to use match_string() helper
  drm/edid: convert to use match_string() helper
  pinctrl: convert to use match_string() helper
  device property: convert to use match_string() helper
  lib/string: introduce match_string() helper
  radix-tree tests: add test for radix_tree_iter_next
  radix-tree tests: add regression3 test
  ...
2016-03-18 19:26:54 -07:00
Al Viro
8b23a8ce10 Merge branches 'work.lookups', 'work.misc' and 'work.preadv2' into for-next 2016-03-18 16:07:38 -04:00
Li Bin
9d2099ab05 x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
Fix the misleading comment for arch/x86/kernel/ftrace.c that it
had used nop instead of jmp.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-18 15:54:01 -04:00
Thomas Gleixner
551adc6057 x86/irq: Cure live lock in fixup_irqs()
Harry reported, that he's able to trigger a system freeze with cpu hot
unplug. The freeze turned out to be a live lock caused by recent changes in
irq_force_complete_move().

When fixup_irqs() and from there irq_force_complete_move() is called on the
dying cpu, then all other cpus are in stop machine an wait for the dying cpu
to complete the teardown. If there is a move of an interrupt pending then
irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
mask and waits for them to clear the mask. That's obviously impossible as
those cpus are firmly stuck in stop machine with interrupts disabled.

I should have known that, but I completely overlooked it being concentrated on
the locking issues around the vectors. And the existance of the call to
__irq_complete_move() in the code, which actually sends the cleanup IPI made
it reasonable to wait for that cleanup to complete. That call was bogus even
before the recent changes as it was just a pointless distraction.

We have to look at two cases:

1) The move_in_progress flag of the interrupt is set

   This means the ioapic has been updated with the new vector, but it has not
   fired yet. In theory there is a race:

   set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
   			      i.e. it's raised on the old vector. 

   So if the target cpu cannot handle that interrupt before the old vector is
   cleaned up, we get a spurious interrupt and in the worst case the ioapic
   irq line becomes stale, but my experiments so far have only resulted in
   spurious interrupts.

   But in case of cpu hotplug this should be a non issue because if the
   affinity update happens right before all cpus rendevouz in stop machine,
   there is no way that the interrupt can be blocked on the target cpu because
   all cpus loops first with interrupts enabled in stop machine, so the old
   vector is not yet cleaned up when the interrupt fires.

   So the only way to run into this issue is if the delivery of the interrupt
   on the apic/system bus would be delayed beyond the point where the target
   cpu disables interrupts in stop machine. I doubt that it can happen, but at
   least there is a theroretical chance. Virtualization might be able to
   expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
   hardware.

   I've spent quite some time over the weekend to enforce that situation,
   though I was not able to trigger the delayed case.

2) The move_in_progress flag is not set and the old_domain cpu mask is not
   empty.

   That means, that an interrupt was delivered after the change and the
   cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
   responded to it yet.

In both cases we can assume that the next interrupt will arrive on the new
vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
mask.

Fixes: 98229aa36c "x86/irq: Plug vector cleanup race"
Reported-by: Harry Junior <harryjr@outlook.fr>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-18 14:51:06 +01:00
Thomas Gleixner
f508a5ba7a x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known()
The topology_core_cpumask is used to find a neighbour cpu in
calibrate_delay_is_known(). It might not be allocated at the first invocation
of that function on the boot cpu, when CONFIG_CPUMASK_OFFSTACK is set.

The mask is allocated later in native_smp_prepare_cpus. As a consequence the
underlying find_next_bit() call dereferences a NULL pointer.

Add a proper check to prevent this.

Fixes: c25323c073 "x86/tsc: Use topology functions"
Reported-and-tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603180843270.3978@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-18 14:51:06 +01:00
Dave Jones
7834c10313 x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
Since 4.4, I've been able to trigger this occasionally:

===============================
[ INFO: suspicious RCU usage. ]
4.5.0-rc7-think+ #3 Not tainted
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20160315012054.GA17765@codemonkey.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/3/0.

stack backtrace:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc7-think+ #3
 ffffffff92f821e0 1f3e5c340597d7fc ffff880468e07f10 ffffffff92560c2a
 ffff880462145280 0000000000000001 ffff880468e07f40 ffffffff921376a6
 ffffffff93665ea0 0000cc7c876d28da 0000000000000005 ffffffff9383dd60
Call Trace:
 <IRQ>  [<ffffffff92560c2a>] dump_stack+0x67/0x9d
 [<ffffffff921376a6>] lockdep_rcu_suspicious+0xe6/0x100
 [<ffffffff925ae7a7>] do_trace_write_msr+0x127/0x1a0
 [<ffffffff92061c83>] native_apic_msr_eoi_write+0x23/0x30
 [<ffffffff92054408>] smp_trace_call_function_interrupt+0x38/0x360
 [<ffffffff92d1ca60>] trace_call_function_interrupt+0x90/0xa0
 <EOI>  [<ffffffff92ac5124>] ? cpuidle_enter_state+0x1b4/0x520

Move the entering_irq() call before ack_APIC_irq(), because entering_irq()
tells the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Fixes: 4787c368a9 "x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()"
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2016-03-18 14:51:06 +01:00
Linus Torvalds
0f49fc95b8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching update from Jiri Kosina:

 - cleanup of module notifiers; this depends on a module.c cleanup which
   has been acked by Rusty; from Jessica Yu

 - small assorted fixes and MAINTAINERS update

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch/module: remove livepatch module notifier
  modules: split part of complete_formation() into prepare_coming_module()
  livepatch: Update maintainers
  livepatch: Fix the error message about unresolvable ambiguity
  klp: remove CONFIG_LIVEPATCH dependency from klp headers
  klp: remove superfluous errors in asm/livepatch.h
2016-03-17 21:46:32 -07:00
Linus Torvalds
1a46712aa9 This is the bulk of GPIO changes for kernel v4.6:
Core changes:
 
 - The gpio_chip is now a *real device*. Until now the gpio chips
   were just piggybacking the parent device or (gasp) floating in
   space outside of the device model. We now finally make GPIO chips
   devices. The gpio_chip will create a gpio_device which contains
   a struct device, and this gpio_device struct is kept private.
   Anything that needs to be kept private from the rest of the kernel
   will gradually be moved over to the gpio_device.
 
 - As a result of making the gpio_device a real device, we have added
   resource management, so devm_gpiochip_add_data() will cut down on
   overhead and reduce code lines. A huge slew of patches convert
   almost all drivers in the subsystem to use this.
 
 - Building on making the GPIO a real device, we add the first step
   of a new userspace ABI: the GPIO character device. We take small
   steps here, so we first add a pure *information* ABI and the tool
   "lsgpio" that will list all GPIO devices on the system and all
   lines on these devices. We can now discover GPIOs properly from
   userspace. We still have not come up with a way to actually *use*
   GPIOs from userspace.
 
 - To encourage people to use the character device for the future,
   we have it always-enabled when using GPIO. The old sysfs ABI is
   still opt-in (and can be used in parallel), but is marked as
   deprecated. We will keep it around for the foreseeable future,
   but it will not be extended to cover ever more use cases.
 
 Cleanup:
 
 - Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
   includes. This dates back to when GPIO was an opt-in feature and
   no shared library even existed: just a header file with proper
   prototypes was provided and all semantics were up to the arch to
   implement. These patches make the GPIO chip even more a proper
   device and cleans out leftovers of the old in-kernel API here
   and there. Still some cruft is left but it's very little now.
 
 - There is still some clamping of return values for .get() going
   on, but we now return sane values in the vast majority of drivers
   and the errorpath is sanitized. Some patches for powerpc, blackfin
   and unicore still drop in.
 
 - We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
   implementations to use gpiochip_add_data() and cut down on code
   lines.
 
 - MPC8xxx is converted to use the generic GPIO helpers.
 
 - ATH79 is converted to use the generic GPIO helpers.
 
 New drivers:
 
 - WinSystems WS16C48
 
 - Acces 104-DIO-48E
 
 - F81866 (a F7188x variant)
 
 - Qoric (a MPC8xxx variant)
 
 - TS-4800
 
 - SPI serializers (pisosr): simple 74xx shift registers connected
   to SPI to obtain a dirt-cheap output-only GPIO expander.
 
 - Texas Instruments TPIC2810
 
 - Texas Instruments TPS65218
 
 - Texas Instruments TPS65912
 
 - X-Gene (ARM64) standby GPIO controller
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6m24AAoJEEEQszewGV1zUasP/RpTrjRcNI5QFHjudd2oioDx
 R/IljC06Q072ZqVy/MR7QxwhoU8jUnCgKgv4rgMa1OcfHblxC2R1+YBKOUSij831
 E+SYmYDYmoMhN7j5Aslr66MXg1rLdFSdCZWemuyNruAK8bx6cTE1AWS8AELQzzTn
 Re/CPpCDbujLy0ZK2wJHgr9ZkdcBGICtDRCrOR3Kyjpwk/DSZcruK1PDN+VQMI3k
 bJlwgtGenOHINgCq/16edpwj/hzmoJXhTOZXJHI5XVR6czTwb3SvCYACvCkauI/a
 /N7b3quG88b5y0OPQPVxp5+VVl9GyVcv5oGzIfTNat/g5QinShZIT4kVV9r0xu6/
 TQHh1HlXleh+QI3yX0oRv9ztHreMf+vdpw1dhIwLqHqfJ7AWdOGk7BbKjwCrsOoq
 t/qUVFnyvooLpyr53Z5JY8+LqyynHF68G+jUQyHLgTZ0GCE+z+1jqNl1T501n3kv
 3CSlNYxSN/YUBN3cnroAIU/ZWcV4YRdxmOtEWP+7xgcdzTE6s/JHb2fuEfVHzWPf
 mHWtJGy8U0IR4VSSEln5RtjhRr0PAjTHeTOGAmivUnaIGDziTowyUVF+X5hwC77E
 DGTuLVx/Kniv173DK7xNAsUZNAETBa3fQZTgu+RfOpMiM1FZc7tI1rd7K7PjbyCc
 d2M0gcq+d11ITJTxC7OM
 =9AJ4
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for kernel v4.6.  There is quite a
  lot of interesting stuff going on.

  The patches to other subsystems and arch-wide are ACKed as far as
  possible, though I consider things like per-arch <asm/gpio.h> as
  essentially a part of the GPIO subsystem so it should not be needed.

  Core changes:

   - The gpio_chip is now a *real device*.  Until now the gpio chips
     were just piggybacking the parent device or (gasp) floating in
     space outside of the device model.

     We now finally make GPIO chips devices.  The gpio_chip will create
     a gpio_device which contains a struct device, and this gpio_device
     struct is kept private.  Anything that needs to be kept private
     from the rest of the kernel will gradually be moved over to the
     gpio_device.

   - As a result of making the gpio_device a real device, we have added
     resource management, so devm_gpiochip_add_data() will cut down on
     overhead and reduce code lines.  A huge slew of patches convert
     almost all drivers in the subsystem to use this.

   - Building on making the GPIO a real device, we add the first step of
     a new userspace ABI: the GPIO character device.  We take small
     steps here, so we first add a pure *information* ABI and the tool
     "lsgpio" that will list all GPIO devices on the system and all
     lines on these devices.

     We can now discover GPIOs properly from userspace.  We still have
     not come up with a way to actually *use* GPIOs from userspace.

   - To encourage people to use the character device for the future, we
     have it always-enabled when using GPIO.  The old sysfs ABI is still
     opt-in (and can be used in parallel), but is marked as deprecated.

     We will keep it around for the foreseeable future, but it will not
     be extended to cover ever more use cases.

  Cleanup:

   - Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
     includes.

     This dates back to when GPIO was an opt-in feature and no shared
     library even existed: just a header file with proper prototypes was
     provided and all semantics were up to the arch to implement.  These
     patches make the GPIO chip even more a proper device and cleans out
     leftovers of the old in-kernel API here and there.

     Still some cruft is left but it's very little now.

   - There is still some clamping of return values for .get() going on,
     but we now return sane values in the vast majority of drivers and
     the errorpath is sanitized.  Some patches for powerpc, blackfin and
     unicore still drop in.

   - We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
     implementations to use gpiochip_add_data() and cut down on code
     lines.

   - MPC8xxx is converted to use the generic GPIO helpers.

   - ATH79 is converted to use the generic GPIO helpers.

  New drivers:

   - WinSystems WS16C48

   - Acces 104-DIO-48E

   - F81866 (a F7188x variant)

   - Qoric (a MPC8xxx variant)

   - TS-4800

   - SPI serializers (pisosr): simple 74xx shift registers connected to
     SPI to obtain a dirt-cheap output-only GPIO expander.

   - Texas Instruments TPIC2810

   - Texas Instruments TPS65218

   - Texas Instruments TPS65912

   - X-Gene (ARM64) standby GPIO controller"

* tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (194 commits)
  Revert "Share upstreaming patches"
  gpio: mcp23s08: Fix clearing of interrupt.
  gpiolib: Fix comment referring to gpio_*() in gpiod_*()
  gpio: pca953x: Fix pca953x_gpio_set_multiple() on 64-bit
  gpio: xgene: Fix kconfig for standby GIPO contoller
  gpio: Add generic serializer DT binding
  gpio: uapi: use 0xB4 as ioctl() major
  gpio: tps65912: fix bad merge
  Revert "gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free"
  gpio: omap: drop dev field from gpio_bank structure
  gpio: mpc8xxx: Slightly update the code for better readability
  gpio: mpc8xxx: Remove *read_reg and *write_reg from struct mpc8xxx_gpio_chip
  gpio: mpc8xxx: Fixup setting gpio direction output
  gpio: mcp23s08: Add support for mcp23s18
  dt-bindings: gpio: altera: Fix altr,interrupt-type property
  gpio: add driver for MEN 16Z127 GPIO controller
  gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free
  gpio: timberdale: Switch to devm_ioremap_resource()
  gpio: ts4800: Add IMX51 dependency
  gpiolib: rewrite gpiodev_add_to_list
  ...
2016-03-17 21:05:32 -07:00
Linus Torvalds
588ab3f9af arm64 updates for 4.6:
- Initial page table creation reworked to avoid breaking large block
   mappings (huge pages) into smaller ones. The ARM architecture requires
   break-before-make in such cases to avoid TLB conflicts but that's not
   always possible on live page tables
 
 - Kernel virtual memory layout: the kernel image is no longer linked to
   the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of
   the vmalloc space, allowing the kernel to be loaded (nearly) anywhere
   in physical RAM
 
 - Kernel ASLR: position independent kernel Image and modules being
   randomly mapped in the vmalloc space with the randomness is provided
   by UEFI (efi_get_random_bytes() patches merged via the arm64 tree,
   acked by Matt Fleming)
 
 - Implement relative exception tables for arm64, required by KASLR
   (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but
   actual x86 conversion to deferred to 4.7 because of the merge
   dependencies)
 
 - Support for the User Access Override feature of ARMv8.2: this allows
   uaccess functions (get_user etc.) to be implemented using LDTR/STTR
   instructions. Such instructions, when run by the kernel, perform
   unprivileged accesses adding an extra level of protection. The
   set_fs() macro is used to "upgrade" such instruction to privileged
   accesses via the UAO bit
 
 - Half-precision floating point support (part of ARMv8.2)
 
 - Optimisations for CPUs with or without a hardware prefetcher (using
   run-time code patching)
 
 - copy_page performance improvement to deal with 128 bytes at a time
 
 - Sanity checks on the CPU capabilities (via CPUID) to prevent
   incompatible secondary CPUs from being brought up (e.g. weird
   big.LITTLE configurations)
 
 - valid_user_regs() reworked for better sanity check of the sigcontext
   information (restored pstate information)
 
 - ACPI parking protocol implementation
 
 - CONFIG_DEBUG_RODATA enabled by default
 
 - VDSO code marked as read-only
 
 - DEBUG_PAGEALLOC support
 
 - ARCH_HAS_UBSAN_SANITIZE_ALL enabled
 
 - Erratum workaround Cavium ThunderX SoC
 
 - set_pte_at() fix for PROT_NONE mappings
 
 - Code clean-ups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+
 RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC
 hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv
 50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3
 DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x
 YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY
 OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk
 EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7
 3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN
 dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r
 xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM
 LepccTgykiUBqW5TRzPz
 =/oS+
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Here are the main arm64 updates for 4.6.  There are some relatively
  intrusive changes to support KASLR, the reworking of the kernel
  virtual memory layout and initial page table creation.

  Summary:

   - Initial page table creation reworked to avoid breaking large block
     mappings (huge pages) into smaller ones.  The ARM architecture
     requires break-before-make in such cases to avoid TLB conflicts but
     that's not always possible on live page tables

   - Kernel virtual memory layout: the kernel image is no longer linked
     to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
     of the vmalloc space, allowing the kernel to be loaded (nearly)
     anywhere in physical RAM

   - Kernel ASLR: position independent kernel Image and modules being
     randomly mapped in the vmalloc space with the randomness is
     provided by UEFI (efi_get_random_bytes() patches merged via the
     arm64 tree, acked by Matt Fleming)

   - Implement relative exception tables for arm64, required by KASLR
     (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
     but actual x86 conversion to deferred to 4.7 because of the merge
     dependencies)

   - Support for the User Access Override feature of ARMv8.2: this
     allows uaccess functions (get_user etc.) to be implemented using
     LDTR/STTR instructions.  Such instructions, when run by the kernel,
     perform unprivileged accesses adding an extra level of protection.
     The set_fs() macro is used to "upgrade" such instruction to
     privileged accesses via the UAO bit

   - Half-precision floating point support (part of ARMv8.2)

   - Optimisations for CPUs with or without a hardware prefetcher (using
     run-time code patching)

   - copy_page performance improvement to deal with 128 bytes at a time

   - Sanity checks on the CPU capabilities (via CPUID) to prevent
     incompatible secondary CPUs from being brought up (e.g.  weird
     big.LITTLE configurations)

   - valid_user_regs() reworked for better sanity check of the
     sigcontext information (restored pstate information)

   - ACPI parking protocol implementation

   - CONFIG_DEBUG_RODATA enabled by default

   - VDSO code marked as read-only

   - DEBUG_PAGEALLOC support

   - ARCH_HAS_UBSAN_SANITIZE_ALL enabled

   - Erratum workaround Cavium ThunderX SoC

   - set_pte_at() fix for PROT_NONE mappings

   - Code clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
  arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
  arm64: kasan: Use actual memory node when populating the kernel image shadow
  arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
  arm64: Fix misspellings in comments.
  arm64: efi: add missing frame pointer assignment
  arm64: make mrs_s prefixing implicit in read_cpuid
  arm64: enable CONFIG_DEBUG_RODATA by default
  arm64: Rework valid_user_regs
  arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
  arm64: KVM: Move kvm_call_hyp back to its original localtion
  arm64: mm: treat memstart_addr as a signed quantity
  arm64: mm: list kernel sections in order
  arm64: lse: deal with clobbered IP registers after branch via PLT
  arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
  arm64: kconfig: add submenu for 8.2 architectural features
  arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
  arm64: Add support for Half precision floating point
  arm64: Remove fixmap include fragility
  arm64: Add workaround for Cavium erratum 27456
  arm64: mm: Mark .rodata as RO
  ...
2016-03-17 20:03:47 -07:00
Kees Cook
4cc7ecb7f2 param: convert some "on"/"off" users to strtobool
This changes several users of manual "on"/"off" parsing to use
strtobool.

Some side-effects:
- these uses will now parse y/n/1/0 meaningfully too
- the early_param uses will now bubble up parse errors

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Perches <joe@perches.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steve French <sfrench@samba.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Joonsoo Kim
fe896d1878 mm: introduce page reference manipulation functions
The success of CMA allocation largely depends on the success of
migration and key factor of it is page reference count.  Until now, page
reference is manipulated by direct calling atomic functions so we cannot
follow up who and where manipulate it.  Then, it is hard to find actual
reason of CMA allocation failure.  CMA allocation should be guaranteed
to succeed so finding offending place is really important.

In this patch, call sites where page reference is manipulated are
converted to introduced wrapper function.  This is preparation step to
add tracepoint to each page reference manipulation function.  With this
facility, we can easily find reason of CMA allocation failure.  There is
no functional change in this patch.

In addition, this patch also converts reference read sites.  It will
help a second step that renames page._count to something else and
prevents later attempt to direct access to it (Suggested by Andrew).

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Kirill A. Shutemov
3ed3a4f0dd mm: cleanup *pte_alloc* interfaces
There are few things about *pte_alloc*() helpers worth cleaning up:

 - 'vma' argument is unused, let's drop it;

 - most __pte_alloc() callers do speculative check for pmd_none(),
   before taking ptl: let's introduce pte_alloc() macro which does
   the check.

   The only direct user of __pte_alloc left is userfaultfd, which has
   different expectation about atomicity wrt pmd.

 - pte_alloc_map() and pte_alloc_map_lock() are redefined using
   pte_alloc().

[sudeep.holla@arm.com: fix build for arm64 hugetlbpage]
[sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Linus Torvalds
bb7aeae3d6 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "There are a bunch of fixes to the TPM, IMA, and Keys code, with minor
  fixes scattered across the subsystem.

  IMA now requires signed policy, and that policy is also now measured
  and appraised"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (67 commits)
  X.509: Make algo identifiers text instead of enum
  akcipher: Move the RSA DER encoding check to the crypto layer
  crypto: Add hash param to pkcs1pad
  sign-file: fix build with CMS support disabled
  MAINTAINERS: update tpmdd urls
  MODSIGN: linux/string.h should be #included to get memcpy()
  certs: Fix misaligned data in extra certificate list
  X.509: Handle midnight alternative notation in GeneralizedTime
  X.509: Support leap seconds
  Handle ISO 8601 leap seconds and encodings of midnight in mktime64()
  X.509: Fix leap year handling again
  PKCS#7: fix unitialized boolean 'want'
  firmware: change kernel read fail to dev_dbg()
  KEYS: Use the symbol value for list size, updated by scripts/insert-sys-cert
  KEYS: Reserve an extra certificate symbol for inserting without recompiling
  modsign: hide openssl output in silent builds
  tpm_tis: fix build warning with tpm_tis_resume
  ima: require signed IMA policy
  ima: measure and appraise the IMA policy itself
  ima: load policy using path
  ...
2016-03-17 11:33:45 -07:00
Linus Torvalds
70477371dc Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.6:

  API:
   - Convert remaining crypto_hash users to shash or ahash, also convert
     blkcipher/ablkcipher users to skcipher.
   - Remove crypto_hash interface.
   - Remove crypto_pcomp interface.
   - Add crypto engine for async cipher drivers.
   - Add akcipher documentation.
   - Add skcipher documentation.

  Algorithms:
   - Rename crypto/crc32 to avoid name clash with lib/crc32.
   - Fix bug in keywrap where we zero the wrong pointer.

  Drivers:
   - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
   - Add PIC32 hwrng driver.
   - Support BCM6368 in bcm63xx hwrng driver.
   - Pack structs for 32-bit compat users in qat.
   - Use crypto engine in omap-aes.
   - Add support for sama5d2x SoCs in atmel-sha.
   - Make atmel-sha available again.
   - Make sahara hashing available again.
   - Make ccp hashing available again.
   - Make sha1-mb available again.
   - Add support for multiple devices in ccp.
   - Improve DMA performance in caam.
   - Add hashing support to rockchip"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
  crypto: qat - remove redundant arbiter configuration
  crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
  crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
  crypto: qat - Change the definition of icp_qat_uof_regtype
  hwrng: exynos - use __maybe_unused to hide pm functions
  crypto: ccp - Add abstraction for device-specific calls
  crypto: ccp - CCP versioning support
  crypto: ccp - Support for multiple CCPs
  crypto: ccp - Remove check for x86 family and model
  crypto: ccp - memset request context to zero during import
  lib/mpi: use "static inline" instead of "extern inline"
  lib/mpi: avoid assembler warning
  hwrng: bcm63xx - fix non device tree compatibility
  crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
  crypto: qat - The AE id should be less than the maximal AE number
  lib/mpi: Endianness fix
  crypto: rockchip - add hash support for crypto engine in rk3288
  crypto: xts - fix compile errors
  crypto: doc - add skcipher API documentation
  crypto: doc - update AEAD AD handling
  ...
2016-03-17 11:22:54 -07:00
Andy Lutomirski
c29016cf41 x86/iopl: Fix iopl capability check on Xen PV
iopl(3) is supposed to work if iopl is already 3, even if
unprivileged.  This didn't work right on Xen PV.  Fix it.

Reviewewd-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8ce12013e6e4c0a44a97e316be4a6faff31bd5ea.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-17 09:49:27 +01:00
Andy Lutomirski
b7a584598a x86/iopl/64: Properly context-switch IOPL on Xen PV
On Xen PV, regs->flags doesn't reliably reflect IOPL and the
exit-to-userspace code doesn't change IOPL.  We need to context
switch it manually.

I'm doing this without going through paravirt because this is
specific to Xen PV.  After the dust settles, we can merge this with
the 32-bit code, tidy up the iopl syscall implementation, and remove
the set_iopl pvop entirely.

Fixes XSA-171.

Reviewewd-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/693c3bd7aeb4d3c27c92c622b7d0f554a458173c.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-17 09:49:26 +01:00
Ingo Molnar
00f5268501 Merge branch 'x86/cleanups' into x86/urgent
Pull in some merge window leftovers.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-17 09:44:57 +01:00
Linus Torvalds
8759957b77 libnvdimm for 4.6
1/ Asynchronous address range scrub:
 Given the capacities of next generation persistent memory devices a
 scrub operation to find all poison may take 10s of seconds.  We want
 this scrub work to be done asynchronously with the rest of system
 initialization, so we move it out of line from the NFIT probing, i.e.
 acpi_nfit_add().
 
 2/ Clear poison:
 ACPI 6.1 introduces the ability to send "clear error" commands to the
 ACPI0012:00 device representing the root of an "nvdimm bus".  Similar to
 relocating a bad block on a disk, this support clears media errors in
 response to a write.
 
 3/ Persistent memory resource tracking:
 A persistent memory range may be designated as simply "reserved" by
 platform firmware in the efi/e820 memory map.  Later when the NFIT
 driver loads it discovers that the range is "Persistent Memory".  The
 NFIT bus driver inserts a resource to advertise that "persistent"
 attribute in the system resource tree for /proc/iomem and
 kernel-internal usages.
 
 4/ Miscellaneous cleanups and fixes:
 Workaround section misaligned pmem ranges when allocating a struct page
 memmap, fix handling of the read-only case in the ioctl path, and clean
 up block device major number allocation.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6E0QAAoJEB7SkWpmfYgCM9EP/Aibi3BAYlv6KeCgLFXxNIyR
 Y7rA0K5UiJwzQMWmo3xQ4EOvAHVCQ33cVEdXy0zJPLnzQ+GCvaMuD/pxOB+HoZWq
 qUYdVvNomh7VzZDkbONidjuk4kwNHq8HtOo1bdGlPiXjIWEh3uop/rIShPFsRp9i
 RVByTE/9TGoDQ9Q6Aakw1GlvT75tZ36ZqwkM2jyzu1a7fmqfkfAJjjDY6gzm3/fJ
 OVv1SDGwknoTPMZFoAh5iyrzHsShw1l1nZFhP4LiulSUEYv4B1I0YNvzbmY9EkgQ
 LHg/HChXpDCfQN/68k0W7OX6rYPSNjeiX0Y+kqc9owznA32lxsdSMUHcEnGz/3ZE
 2yy0XfGMHYsXaWI514dKp1LceTvWYsuQ+NtYnDzEwMch9YjAJpOkxaJTqoRjD0rI
 2yxPamLrF1RP7r0jUw2OiMBBpf/N6NvwbIUJ4ssR87ryA8axNcs8Teeu1lgDjajS
 Xp2AKP5ViWP+lGdAJBY/fa70nSL6oyrHQlzV/3zAPyrVyhAfOTc5mHamlvzYYSBJ
 EoHDG1A0diP/E4wdiVNrD2fcKie5Vmp4Ws59OCAM8PwOJRXyRGfVB7PP+Q1DSZlc
 Tsh0QFjfGQOhS02VEaQPm7A19BYFgpTMgU6YqPOPyqVYALIqzj21Ov7+2VI73FyG
 ORqEjCAxLVto+3gjN0oD
 =F67V
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:

 - Asynchronous address range scrub:

     Given the capacities of next generation persistent memory devices a
     scrub operation to find all poison may take 10s of seconds.  We
     want this scrub work to be done asynchronously with the rest of
     system initialization, so we move it out of line from the NFIT
     probing, i.e. acpi_nfit_add().

 - Clear poison:

     ACPI 6.1 introduces the ability to send "clear error" commands to
     the ACPI0012:00 device representing the root of an "nvdimm bus".
     Similar to relocating a bad block on a disk, this support clears
     media errors in response to a write.

 - Persistent memory resource tracking:

     A persistent memory range may be designated as simply "reserved" by
     platform firmware in the efi/e820 memory map.  Later when the NFIT
     driver loads it discovers that the range is "Persistent Memory".

     The NFIT bus driver inserts a resource to advertise that
     "persistent" attribute in the system resource tree for /proc/iomem
     and kernel-internal usages.

 - Miscellaneous cleanups and fixes:

     Workaround section misaligned pmem ranges when allocating a struct
     page memmap, fix handling of the read-only case in the ioctl path,
     and clean up block device major number allocation.

* tag 'libnvdimm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
  libnvdimm, pmem: clear poison on write
  libnvdimm, pmem: fix kmap_atomic() leak in error path
  nvdimm/btt: don't allocate unused major device number
  nvdimm/blk: don't allocate unused major device number
  pmem: don't allocate unused major device number
  ACPI: Change NFIT driver to insert new resource
  resource: Export insert_resource and remove_resource
  resource: Add remove_resource interface
  resource: Change __request_region to inherit from immediate parent
  libnvdimm, pmem: fix ia64 build, use PHYS_PFN
  nfit, libnvdimm: clear poison command support
  libnvdimm, pfn: 'resource'-address and 'size' attributes for pfn devices
  libnvdimm, pmem: adjust for section collisions with 'System RAM'
  libnvdimm, pmem: fix 'pfn' support for section-misaligned namespaces
  libnvdimm: Fix security issue with DSM IOCTL.
  libnvdimm: Clean-up access mode check.
  tools/testing/nvdimm: expand ars unit testing
  nfit: disable userspace initiated ars during scrub
  nfit: scrub and register regions in a workqueue
  nfit, libnvdimm: async region scrub workqueue
  ...
2016-03-16 17:45:56 -07:00
Linus Torvalds
63e30271b0 PCI changes for the v4.6 merge window:
Enumeration
     Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas)
     Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas
 
   Resource management
     Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
     Don't assign or reassign immutable resources (Bjorn Helgaas)
     Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas)
     Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas)
     Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas)
     ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas)
     ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
     MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
     Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas)
     Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas)
     rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
     designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
 
   Virtualization
     Wait for up to 1000ms after FLR reset (Alex Williamson)
     Support SR-IOV on any function type (Kelly Zytaruk)
     Add ACS quirk for all Cavium devices (Manish Jaggi)
 
   AER
     Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas)
     Restore pci_ops pointer while calling original pci_ops (David Daney)
     Fix aer_inject error codes (Jean Delvare)
     Use dev_warn() in aer_inject (Jean Delvare)
     Log actual error causes in aer_inject (Jean Delvare)
     Log aer_inject error injections (Jean Delvare)
 
   VPD
     Prevent VPD access for buggy devices (Babu Moger)
     Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas)
     Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas)
     Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas)
     Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas)
     Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas)
     Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas)
     Update VPD definitions (Hannes Reinecke)
     Allow access to VPD attributes with size 0 (Hannes Reinecke)
     Determine actual VPD size on first access (Hannes Reinecke)
 
   Generic host bridge driver
     Move structure definitions to separate header file (David Daney)
     Add pci_host_common_probe(), based on gen_pci_probe() (David Daney)
     Expose pci_host_common_probe() for use by other drivers (David Daney)
 
   Altera host bridge driver
     Fix altera_pcie_link_is_up() (Ley Foon Tan)
 
   Cavium ThunderX host bridge driver
     Add PCIe host driver for ThunderX processors (David Daney)
     Add driver for ThunderX-pass{1,2} on-chip devices (David Daney)
 
   Freescale i.MX6 host bridge driver
     Add DT bindings to configure PHY Tx driver settings (Justin Waters)
     Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach)
     Move PHY reset into imx6_pcie_establish_link() (Lucas Stach)
     Remove broken Gen2 workaround (Lucas Stach)
     Move link up check into imx6_pcie_wait_for_link() (Lucas Stach)
 
   Freescale Layerscape host bridge driver
     Add "fsl,ls2085a-pcie" compatible ID (Yang Shi)
 
   Intel VMD host bridge driver
     Attach VMD resources to parent domain's resource tree (Jon Derrick)
     Set bus resource start to 0 (Keith Busch)
 
   Microsoft Hyper-V host bridge driver
     Add fwnode_handle to x86 pci_sysdata (Jake Oshins)
     Look up IRQ domain by fwnode_handle (Jake Oshins)
     Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins)
 
   NVIDIA Tegra host bridge driver
     Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding)
     Implement ->{add,remove}_bus() callbacks (Thierry Reding)
     Remove unused struct tegra_pcie.num_ports field (Thierry Reding)
     Track bus -> CPU mapping (Thierry Reding)
     Remove misleading PHYS_OFFSET (Thierry Reding)
 
   Renesas R-Car host bridge driver
     Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman)
 
   Synopsys DesignWare host bridge driver
     ARC: Add PCI support (Joao Pinto)
     Add generic dw_pcie_wait_for_link() (Joao Pinto)
     Add default link up check if sub-driver doesn't override (Joao Pinto)
     Add driver for prototyping kits based on ARC SDP (Joao Pinto)
 
   TI Keystone host bridge driver
     Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin)
 
   Xilinx AXI host bridge driver
     Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada)
     Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada)
     Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada)
     Update Zynq binding with Microblaze node (Bharat Kumar Gogada)
     microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada)
 
   Xilinx NWL host bridge driver
     Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada)
 
   Miscellaneous
     Check device_attach() return value always (Bjorn Helgaas)
     Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas)
     Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas)
     ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas)
     Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas)
     Remove includes of asm/pci-bridge.h (Bjorn Helgaas)
     Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas)
     unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas)
     Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler)
     Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas)
     Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa)
     frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig)
     Move pci_dma_* helpers to common code (Christoph Hellwig)
     Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus)
     Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson)
     Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6XgMAAoJEFmIoMA60/r8Yq4P/1nNwwZPikU+9Z8k0HyGPll6
 vqXBOYj/wlbAxJTzH2weaoyUamFrwvsKaO3Vap3xHkAeTFPD/Dp0TipCCNMrZ82Z
 j1y83JJpenkRyX6ifLARCNYpOtvnvgzSrO9x7Sb2Xfqb64dPb7+jGAfOpGNzhKsO
 n1nj/L7RGx8Q6fNFGf8ANMXKTsdkdL+1pdwegjUXmD5WdOT+oW8DmqVbhyfSKwl0
 E8r4Ml2lIg7Qd5Wu5iKMIBsR0+5HEyrwV7ch92wXChwKfoRwG70qnn7FGdc0y5ZB
 XvJuj8UD5UeMxEUeoRa9SwU6wWQT3Q9e6BzMS+P+43z36SPYjMfy/Xffv054z/bY
 rQomLjuGxNLESpmfNK5JfKxWoe2YNXjHQIDWMrAHyNlwdKJbYiwPcxnZJhvOa/eB
 p0QYcGS7O43STjibG9PZhzeq8tuSJRshxi0W6iB9QlqO8qs8nJQxIO+sZj/vl4yz
 lSnswWcV9062KITl8Fe9xDw244/RTz1xSVCdldlSoDhJyeMOjRvzS8raUMyyVmbA
 YULsI3l2iCl+fwDm/T21o7hJG966oYdAmgEv7lc7BWfgEAMg//LZXvMzVvrPFB2D
 R77u/0idtOciVJrmnO/x9DnQO2hzro9SLmVH6m0+0YU4wSSpZfGn98PCrtkatOAU
 c8zT9dJgyJVE3Z7cnPJ4
 =otsF
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for v4.6:

  Enumeration:
   - Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas)
   - Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas

  Resource management:
   - Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
   - Don't assign or reassign immutable resources (Bjorn Helgaas)
   - Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas)
   - Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas)
   - Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas)
   - ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas)
   - ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
   - MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
   - Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas)
   - Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas)
   - rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
   - designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)

  Virtualization:
   - Wait for up to 1000ms after FLR reset (Alex Williamson)
   - Support SR-IOV on any function type (Kelly Zytaruk)
   - Add ACS quirk for all Cavium devices (Manish Jaggi)

  AER:
   - Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas)
   - Restore pci_ops pointer while calling original pci_ops (David Daney)
   - Fix aer_inject error codes (Jean Delvare)
   - Use dev_warn() in aer_inject (Jean Delvare)
   - Log actual error causes in aer_inject (Jean Delvare)
   - Log aer_inject error injections (Jean Delvare)

  VPD:
   - Prevent VPD access for buggy devices (Babu Moger)
   - Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas)
   - Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas)
   - Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas)
   - Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas)
   - Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas)
   - Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas)
   - Update VPD definitions (Hannes Reinecke)
   - Allow access to VPD attributes with size 0 (Hannes Reinecke)
   - Determine actual VPD size on first access (Hannes Reinecke)

  Generic host bridge driver:
   - Move structure definitions to separate header file (David Daney)
   - Add pci_host_common_probe(), based on gen_pci_probe() (David Daney)
   - Expose pci_host_common_probe() for use by other drivers (David Daney)

  Altera host bridge driver:
   - Fix altera_pcie_link_is_up() (Ley Foon Tan)

  Cavium ThunderX host bridge driver:
   - Add PCIe host driver for ThunderX processors (David Daney)
   - Add driver for ThunderX-pass{1,2} on-chip devices (David Daney)

  Freescale i.MX6 host bridge driver:
   - Add DT bindings to configure PHY Tx driver settings (Justin Waters)
   - Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach)
   - Move PHY reset into imx6_pcie_establish_link() (Lucas Stach)
   - Remove broken Gen2 workaround (Lucas Stach)
   - Move link up check into imx6_pcie_wait_for_link() (Lucas Stach)

  Freescale Layerscape host bridge driver:
   - Add "fsl,ls2085a-pcie" compatible ID (Yang Shi)

  Intel VMD host bridge driver:
   - Attach VMD resources to parent domain's resource tree (Jon Derrick)
   - Set bus resource start to 0 (Keith Busch)

  Microsoft Hyper-V host bridge driver:
   - Add fwnode_handle to x86 pci_sysdata (Jake Oshins)
   - Look up IRQ domain by fwnode_handle (Jake Oshins)
   - Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins)

  NVIDIA Tegra host bridge driver:
   - Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding)
   - Implement ->{add,remove}_bus() callbacks (Thierry Reding)
   - Remove unused struct tegra_pcie.num_ports field (Thierry Reding)
   - Track bus -> CPU mapping (Thierry Reding)
   - Remove misleading PHYS_OFFSET (Thierry Reding)

  Renesas R-Car host bridge driver:
   - Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman)

  Synopsys DesignWare host bridge driver:
   - ARC: Add PCI support (Joao Pinto)
   - Add generic dw_pcie_wait_for_link() (Joao Pinto)
   - Add default link up check if sub-driver doesn't override (Joao Pinto)
   - Add driver for prototyping kits based on ARC SDP (Joao Pinto)

  TI Keystone host bridge driver:
   - Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin)

  Xilinx AXI host bridge driver:
   - Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada)
   - Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada)
   - Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada)
   - Update Zynq binding with Microblaze node (Bharat Kumar Gogada)
   - microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada)

  Xilinx NWL host bridge driver:
   - Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada)

  Miscellaneous:
   - Check device_attach() return value always (Bjorn Helgaas)
   - Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas)
   - Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas)
   - ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas)
   - Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas)
   - Remove includes of asm/pci-bridge.h (Bjorn Helgaas)
   - Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas)
   - unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas)
   - Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler)
   - Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas)
   - Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa)
   - frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig)
   - Move pci_dma_* helpers to common code (Christoph Hellwig)
   - Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus)
   - Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson)
   - Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)"

* tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits)
  PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
  PCI: designware: Add driver for prototyping kits based on ARC SDP
  PCI: designware: Add default link up check if sub-driver doesn't override
  PCI: designware: Add generic dw_pcie_wait_for_link()
  PCI: Cleanup pci/pcie/Kconfig whitespace
  PCI: Simplify pci_create_attr() control flow
  PCI: Don't leak memory if sysfs_create_bin_file() fails
  PCI: Simplify sysfs ROM cleanup
  PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
  MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource
  MIPS: Loongson 3: Use temporary struct resource * to avoid repetition
  ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource
  ia64/PCI: Use ioremap() instead of open-coded equivalent
  ia64/PCI: Use temporary struct resource * to avoid repetition
  PCI: Clean up pci_map_rom() whitespace
  PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs
  PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices
  PCI: thunder: Add PCIe host driver for ThunderX processors
  PCI: generic: Expose pci_host_common_probe() for use by other drivers
  PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe()
  ...
2016-03-16 14:45:55 -07:00
Linus Torvalds
277edbabf6 Power management and ACPI material for v4.6-rc1, part 1
- Redesign of cpufreq governors and the intel_pstate driver to
    make them use callbacks invoked by the scheduler to trigger CPU
    frequency evaluation instead of using per-CPU deferrable timers
    for that purpose (Rafael Wysocki).
 
  - Reorganization and cleanup of cpufreq governor code to make it
    more straightforward and fix some concurrency problems in it
    (Rafael Wysocki, Viresh Kumar).
 
  - Cleanup and improvements of locking in the cpufreq core (Viresh
    Kumar).
 
  - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
    Kumar, Eric Biggers).
 
  - intel_pstate driver updates including fixes, optimizations and a
    modification to make it enable enable hardware-coordinated P-state
    selection (HWP) by default if supported by the processor (Philippe
    Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
    Franciosi).
 
  - Operating Performance Points (OPP) framework updates to improve
    its handling of voltage regulators and device clocks and updates
    of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
 
  - Updates of the powernv cpufreq driver to fix initialization
    and cleanup problems in it and correct its worker thread handling
    with respect to CPU offline, new powernv_throttle tracepoint
    (Shilpasri Bhat).
 
  - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
 
  - ACPICA updates including one fix for a regression introduced
    by previos changes in the ACPICA code (Bob Moore, Lv Zheng,
    David Box, Colin Ian King).
 
  - Support for installing ACPI tables from initrd (Lv Zheng).
 
  - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
    Chaugule).
 
  - Support for _HID(ACPI0010) devices (ACPI processor containers)
    and ACPI processor driver cleanups (Sudeep Holla).
 
  - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
    Aleksey Makarov).
 
  - Modification of the ACPI PCI IRQ management code to make it treat
    255 in the Interrupt Line register as "not connected" on x86 (as
    per the specification) and avoid attempts to use that value as
    a valid interrupt vector (Chen Fan).
 
  - ACPI APEI fixes related to resource leaks (Josh Hunt).
 
  - Removal of modularity from a few ACPI drivers (BGRT, GHES,
    intel_pmic_crc) that cannot be built as modules in practice (Paul
    Gortmaker).
 
  - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
    as a valid resource type (Harb Abdulhamid).
 
  - New device ID (future AMD I2C controller) in the ACPI driver for
    AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
 
  - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
 
  - cpuidle menu governor optimization to avoid a square root
    computation in it (Rasmus Villemoes).
 
  - Fix for potential use-after-free in the generic device properties
    framework (Heikki Krogerus).
 
  - Updates of the generic power domains (genpd) framework including
    support for multiple power states of a domain, fixes and debugfs
    output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
    Geert Uytterhoeven).
 
  - Intel RAPL power capping driver updates to reduce IPI overhead in
    it (Jacob Pan).
 
  - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
    Sengar).
 
  - Year 2038 fix for the process freezer (Abhilash Jindal).
 
  - turbostat utility updates including new features (decoding of more
    registers and CPUID fields, sub-second intervals support, GFX MHz
    and RC6 printout, --out command line option), fixes (syscall jitter
    detection and workaround, reductioin of the number of syscalls made,
    fixes related to Xeon x200 processors, compiler warning fixes) and
    cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJW50NXAAoJEILEb/54YlRxvr8QAIktC9+ft0y5AmU46hDcBWcK
 QutyWJL9X9BS6DWBJZA2qclDYFmhMfi5Fza1se0gQ9TnLB/KrBwHWLsiYoTsb1k+
 nPKf214aPk+qAhkVuyB4leNWML9Qz9n9jwku/EYxWWpgtbSRf3+0ioIKZeWWc/8V
 JvuaOu4O+g/tkmL7QTrnGWBwhIIssAAV85QPsHkx+g68MrCj4UMMzm7z9G21SPXX
 bmP8yIHsczX/XnRsY0W2NSno7Vdk6ImHpDJ26IAZg28WRNPWICHgGYHvB0TTWMvb
 tts+yqfF7/7QLRjT/M8k9CzDBDE/DnVqoZ0fNJ+aYr7hNKF32mtAN+jH9ZB9dl/P
 fEFapJkPxnWyzAoVoB9Dz0rkcZkYMlbxlLWzUGpaPq0JflUUTzLk0ApSjmMn4HRO
 UddwCDdyHTaYThp3gn6GbOb0pIP0SdOVbI1M2QV2x/4PLcT2Ft8Np1+1RFWOeinZ
 Bdl9AE890big0808mqbBzw/buETwr9FjHtCdDPXpP0vJpkBLu3nIYRNb0LCt39es
 mWMp6dFhGgvGj3D3ahTuV3GI8hdpDkh9SObexa11RCjkTKrXcwEmFxHxLeFXwKYq
 alG278bo6cSChRMziS1lis+W/3tsJRN4TXUSv1PPzJHrFgptQVFRStU9ngBKP+pN
 WB+itPc4Fw0YHOrAFsrx
 =cfty
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "This time the majority of changes go into cpufreq and they are
  significant.

  First off, the way CPU frequency updates are triggered is different
  now.  Instead of having to set up and manage a deferrable timer for
  each CPU in the system to evaluate and possibly change its frequency
  periodically, cpufreq governors set up callbacks to be invoked by the
  scheduler on a regular basis (basically on utilization updates).  The
  "old" governors, "ondemand" and "conservative", still do all of their
  work in process context (although that is triggered by the scheduler
  now), but intel_pstate does it all in the callback invoked by the
  scheduler with no need for any additional asynchronous processing.

  Of course, this eliminates the overhead related to the management of
  all those timers, but also it allows the cpufreq governor code to be
  simplified quite a bit.  On top of that, the common code and data
  structures used by the "ondemand" and "conservative" governors are
  cleaned up and made more straightforward and some long-standing and
  quite annoying problems are addressed.  In particular, the handling of
  governor sysfs attributes is modified and the related locking becomes
  more fine grained which allows some concurrency problems to be avoided
  (particularly deadlocks with the core cpufreq code).

  In principle, the new mechanism for triggering frequency updates
  allows utilization information to be passed from the scheduler to
  cpufreq.  Although the current code doesn't make use of it, in the
  works is a new cpufreq governor that will make decisions based on the
  scheduler's utilization data.  That should allow the scheduler and
  cpufreq to work more closely together in the long run.

  In addition to the core and governor changes, cpufreq drivers are
  updated too.  Fixes and optimizations go into intel_pstate, the
  cpufreq-dt driver is updated on top of some modification in the
  Operating Performance Points (OPP) framework and there are fixes and
  other updates in the powernv cpufreq driver.

  Apart from the cpufreq updates there is some new ACPICA material,
  including a fix for a problem introduced by previous ACPICA updates,
  and some less significant changes in the ACPI code, like CPPC code
  optimizations, ACPI processor driver cleanups and support for loading
  ACPI tables from initrd.

  Also updated are the generic power domains framework, the Intel RAPL
  power capping driver and the turbostat utility and we have a bunch of
  traditional assorted fixes and cleanups.

  Specifics:

   - Redesign of cpufreq governors and the intel_pstate driver to make
     them use callbacks invoked by the scheduler to trigger CPU
     frequency evaluation instead of using per-CPU deferrable timers for
     that purpose (Rafael Wysocki).

   - Reorganization and cleanup of cpufreq governor code to make it more
     straightforward and fix some concurrency problems in it (Rafael
     Wysocki, Viresh Kumar).

   - Cleanup and improvements of locking in the cpufreq core (Viresh
     Kumar).

   - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
     Kumar, Eric Biggers).

   - intel_pstate driver updates including fixes, optimizations and a
     modification to make it enable enable hardware-coordinated P-state
     selection (HWP) by default if supported by the processor (Philippe
     Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
     Franciosi).

   - Operating Performance Points (OPP) framework updates to improve its
     handling of voltage regulators and device clocks and updates of the
     cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).

   - Updates of the powernv cpufreq driver to fix initialization and
     cleanup problems in it and correct its worker thread handling with
     respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
     Bhat).

   - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).

   - ACPICA updates including one fix for a regression introduced by
     previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
     Colin Ian King).

   - Support for installing ACPI tables from initrd (Lv Zheng).

   - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
     Chaugule).

   - Support for _HID(ACPI0010) devices (ACPI processor containers) and
     ACPI processor driver cleanups (Sudeep Holla).

   - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
     Aleksey Makarov).

   - Modification of the ACPI PCI IRQ management code to make it treat
     255 in the Interrupt Line register as "not connected" on x86 (as
     per the specification) and avoid attempts to use that value as a
     valid interrupt vector (Chen Fan).

   - ACPI APEI fixes related to resource leaks (Josh Hunt).

   - Removal of modularity from a few ACPI drivers (BGRT, GHES,
     intel_pmic_crc) that cannot be built as modules in practice (Paul
     Gortmaker).

   - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
     as a valid resource type (Harb Abdulhamid).

   - New device ID (future AMD I2C controller) in the ACPI driver for
     AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).

   - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).

   - cpuidle menu governor optimization to avoid a square root
     computation in it (Rasmus Villemoes).

   - Fix for potential use-after-free in the generic device properties
     framework (Heikki Krogerus).

   - Updates of the generic power domains (genpd) framework including
     support for multiple power states of a domain, fixes and debugfs
     output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
     Geert Uytterhoeven).

   - Intel RAPL power capping driver updates to reduce IPI overhead in
     it (Jacob Pan).

   - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
     Sengar).

   - Year 2038 fix for the process freezer (Abhilash Jindal).

   - turbostat utility updates including new features (decoding of more
     registers and CPUID fields, sub-second intervals support, GFX MHz
     and RC6 printout, --out command line option), fixes (syscall jitter
     detection and workaround, reductioin of the number of syscalls
     made, fixes related to Xeon x200 processors, compiler warning
     fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"

* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
  tools/power turbostat: bugfix: TDP MSRs print bits fixing
  tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
  tools/power turbostat: call __cpuid() instead of __get_cpuid()
  tools/power turbostat: indicate SMX and SGX support
  tools/power turbostat: detect and work around syscall jitter
  tools/power turbostat: show GFX%rc6
  tools/power turbostat: show GFXMHz
  tools/power turbostat: show IRQs per CPU
  tools/power turbostat: make fewer systems calls
  tools/power turbostat: fix compiler warnings
  tools/power turbostat: add --out option for saving output in a file
  tools/power turbostat: re-name "%Busy" field to "Busy%"
  tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
  tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
  tools/power turbostat: allow sub-sec intervals
  ACPI / APEI: ERST: Fixed leaked resources in erst_init
  ACPI / APEI: Fix leaked resources
  intel_pstate: Do not skip samples partially
  intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
  intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
  ...
2016-03-16 14:10:53 -07:00
Linus Torvalds
271ecc5253 Merge branch 'akpm' (patches from Andrew)
Merge first patch-bomb from Andrew Morton:

 - some misc things

 - ofs2 updates

 - about half of MM

 - checkpatch updates

 - autofs4 update

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  autofs4: fix string.h include in auto_dev-ioctl.h
  autofs4: use pr_xxx() macros directly for logging
  autofs4: change log print macros to not insert newline
  autofs4: make autofs log prints consistent
  autofs4: fix some white space errors
  autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()
  autofs4: fix coding style line length in autofs4_wait()
  autofs4: fix coding style problem in autofs4_get_set_timeout()
  autofs4: coding style fixes
  autofs: show pipe inode in mount options
  kallsyms: add support for relative offsets in kallsyms address table
  kallsyms: don't overload absolute symbol type for percpu symbols
  x86: kallsyms: disable absolute percpu symbols on !SMP
  checkpatch: fix another left brace warning
  checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses
  checkpatch: warn on bare unsigned or signed declarations without int
  checkpatch: exclude asm volatile from complex macro check
  mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()
  mm: migrate: consolidate mem_cgroup_migrate() calls
  mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  ...
2016-03-16 11:51:08 -07:00
Linus Torvalds
10dc374766 One of the largest releases for KVM... Hardly any generic improvement,
but lots of architecture-specific changes.
 
 * ARM:
 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
 - PMU support for guests
 - 32bit world switch rewritten in C
 - various optimizations to the vgic save/restore code.
 
 * PPC:
 - enabled KVM-VFIO integration ("VFIO device")
 - optimizations to speed up IPIs between vcpus
 - in-kernel handling of IOMMU hypercalls
 - support for dynamic DMA windows (DDW).
 
 * s390:
 - provide the floating point registers via sync regs;
 - separated instruction vs. data accesses
 - dirty log improvements for huge guests
 - bugfixes and documentation improvements.
 
 * x86:
 - Hyper-V VMBus hypercall userspace exit
 - alternative implementation of lowest-priority interrupts using vector
 hashing (for better VT-d posted interrupt support)
 - fixed guest debugging with nested virtualizations
 - improved interrupt tracking in the in-kernel IOAPIC
 - generic infrastructure for tracking writes to guest memory---currently
 its only use is to speedup the legacy shadow paging (pre-EPT) case, but
 in the future it will be used for virtual GPUs as well
 - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0
 yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb
 oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX
 tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o
 Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx
 WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII=
 =OUZZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "One of the largest releases for KVM...  Hardly any generic
  changes, but lots of architecture-specific updates.

  ARM:
   - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
   - PMU support for guests
   - 32bit world switch rewritten in C
   - various optimizations to the vgic save/restore code.

  PPC:
   - enabled KVM-VFIO integration ("VFIO device")
   - optimizations to speed up IPIs between vcpus
   - in-kernel handling of IOMMU hypercalls
   - support for dynamic DMA windows (DDW).

  s390:
   - provide the floating point registers via sync regs;
   - separated instruction vs.  data accesses
   - dirty log improvements for huge guests
   - bugfixes and documentation improvements.

  x86:
   - Hyper-V VMBus hypercall userspace exit
   - alternative implementation of lowest-priority interrupts using
     vector hashing (for better VT-d posted interrupt support)
   - fixed guest debugging with nested virtualizations
   - improved interrupt tracking in the in-kernel IOAPIC
   - generic infrastructure for tracking writes to guest
     memory - currently its only use is to speedup the legacy shadow
     paging (pre-EPT) case, but in the future it will be used for
     virtual GPUs as well
   - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
  KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
  KVM: x86: disable MPX if host did not enable MPX XSAVE features
  arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
  arm64: KVM: vgic-v3: Reset LRs at boot time
  arm64: KVM: vgic-v3: Do not save an LR known to be empty
  arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
  arm64: KVM: vgic-v3: Avoid accessing ICH registers
  KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
  KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
  KVM: arm/arm64: vgic-v2: Reset LRs at boot time
  KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
  KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
  KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
  KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
  KVM: s390: allocate only one DMA page per VM
  KVM: s390: enable STFLE interpretation only if enabled for the guest
  KVM: s390: wake up when the VCPU cpu timer expires
  KVM: s390: step the VCPU timer while in enabled wait
  KVM: s390: protect VCPU cpu timer with a seqcount
  KVM: s390: step VCPU cpu timer during kvm_run ioctl
  ...
2016-03-16 09:55:35 -07:00